On Mon, Jan 19, 2015 at 4:53 PM, Charles Polisher cpolish@surewest.net wrote:
On Jan 07, 2015 at 01:47:53PM -0600, Les Mikesell wrote:
I see a bunch of entries like: ioatdma 0000:00:08.0: Channel halted, chanerr = 2 ioatdma 0000:00:08.0: Channel halted, chanerr = 0 in the logs and one of these: hrtimer: interrupt took 258633 ns
Not sure what those mean. We do have considerably more systems running windows than linux on this hardware and I don't think anyone has noticed a systemic problem there.
Was this resolved? The ioatdma messages are from ioat_dma.c, a driver for Intel's I/OAT DMA engine typically used on high-end server hardware to accelerate network I/O. chanerr = 2 might be an issue with the DMA channel being in a suspended state when the driver isn't expecting it to be. Maybe a network driver bug.
No, reboots are rare on these servers and file corruption is rare even within those, so I don't anticipate seeing enough instances to find a pattern.