[CentOS-virt] Guests pausing suddenly

Benjamin Franz

jfranz at freerun.com
Thu Apr 26 13:32:11 UTC 2012

On 04/26/2012 02:29 AM, Peter Hopfgartner wrote:
> The problem got slightly better when I upgraded all kernels, on host and
> guest, so that the "MTBF" went from 3-4 days to approx 50. Still, the
> problem is not solved, yet.
> A maybe stupid question: If the kernel in the guest sees an I/O error on
> sda, could this be a real error on the physical disk, even if there are
> no notices in the physical hosts log files, or is this more of a
> software problem?
> As the next step, I'll try to update the physical servers firmware.
> Any suggestion on this topic is welcome, even more then before.

This could be being caused by failing areas on the underlaying disk 
drive. Particularly if you are using consumer grade hard drives instead 
of enterprise drives. The most relevant difference here is that consumer 
grade drives can try for up to a couple of minutes to read a bad sector 
and might eventually succeed if the error isn't too egregious while an 
enterprise drive will just quickly report the sector as unreadable and 
move on.

I would install smartmontools on the physical server and check the SMART 
status of the drive after running a 'long' test.

