Re: [CentOS] CentOS-6.8 fsck report Maximal Count

List overview All Threads
Download

newer

older

kernel memory accounting

OT: hardware suggestions for video...

James B. Byrne

10 Mar 2017 10 Mar '17

1:32 p.m.

On Thu, March 9, 2017 09:46, John Hodrien wrote:

...

On Thu, 9 Mar 2017, James B. Byrne wrote:

...
This indicated that a bad sector on the underlying disk system might be the source of the problem. The guests were all shutdown, a /forcefsck file was created on the host system, and the host system remotely restarted.

fsck's not good at finding disk errors, it finds filesystem errors.

If not fsck then what?

...

If it was a real disk issue, you'd expect matching errors in the host logs.

Yes, there are:

Mar 9 09:14:13 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063 Mar 9 09:14:30 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063 Mar 9 09:14:48 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063

I am running an extended SMART test on the drive at the moment. I suspect that the drive is probably at its EOL for practical purposes. So likely we will be looking at an equipment upgrade given the age of the rest of the equipment.

In the meantime what steps, if any, should I take to remediate this problem?

...

...
/var/log/messages:Mar 9 08:34:48 vhost03 kernel: EXT4-fs (dm-6): warning: maximal mount count reached, running e2fsck is recommended

Unmount it and run fsck on it, and that message would go away. But I'd not worry about that one.

jh

-- *** e-Mail is NOT a SECURE channel *** Do NOT transmit sensitive data via e-Mail Do NOT open attachments nor follow links sent by e-Mail James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3

Show replies by date

Warren Young

10 Mar 10 Mar

3:52 p.m.

New subject: CentOS-6.8 fsck report Maximal Count

On Mar 10, 2017, at 6:32 AM, James B. Byrne byrnejb@harte-lyne.ca wrote:

...

On Thu, March 9, 2017 09:46, John Hodrien wrote:

...
fsck's not good at finding disk errors, it finds filesystem errors.

If not fsck then what?

badblocks(8).

Valeri Galtsev

4:28 p.m.

New subject: CentOS-6.8 fsck report Maximal Count

On Fri, March 10, 2017 9:52 am, Warren Young wrote:

...

On Mar 10, 2017, at 6:32 AM, James B. Byrne byrnejb@harte-lyne.ca wrote:

...
On Thu, March 9, 2017 09:46, John Hodrien wrote:

...
fsck's not good at finding disk errors, it finds filesystem errors.

If not fsck then what?

badblocks(8).

And I definitely will unmount relevant filesystem(s) before using badblocks...

...

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++

Warren Young

5:45 p.m.

New subject: CentOS-6.8 fsck report Maximal Count

On Mar 10, 2017, at 9:28 AM, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:

...

On Fri, March 10, 2017 9:52 am, Warren Young wrote:

...
On Mar 10, 2017, at 6:32 AM, James B. Byrne byrnejb@harte-lyne.ca wrote:

...
On Thu, March 9, 2017 09:46, John Hodrien wrote:

...
fsck's not good at finding disk errors, it finds filesystem errors.

If not fsck then what?

badblocks(8).

And I definitely will unmount relevant filesystem(s) before using badblocks…

You don’t necessarily have to. The default mode of badblocks is a non-invasive read-only test which is safe to run on a mounted filesystem.

That said, a read-only badblocks pass can give a false “no errors” report in cases where a non-destructive read-then-write pass (-n) will show errors.

Alternatively, a read-only pass may show an error that a read-then-write pass will silently bury by forcing the drive to relocate the bad sector.

In extreme cases, you could potentially fix a problem with a read-random-random-write pass (-n -t random -t random) because that will statistically flip all the bits at least twice, which may rub the drive’s nose in a bad sector, forcing a reallocation where a normal read-then-write pass (-n alone) may not.

Hard drives are weird. It is only through the grace of ECC and such that they approximate deterministic behavior as well as they do.

Jay Hart

7:03 p.m.

New subject: CentOS-6.8 fsck report Maximal Count

I get up around 0630, u can come anytime after that. I want to hit the range that morning but if I KNEW when you are arriving, I could plan around that...

...

On Mar 10, 2017, at 9:28 AM, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:

...
On Fri, March 10, 2017 9:52 am, Warren Young wrote:

...
On Mar 10, 2017, at 6:32 AM, James B. Byrne byrnejb@harte-lyne.ca wrote:

...
On Thu, March 9, 2017 09:46, John Hodrien wrote:

...
fsck's not good at finding disk errors, it finds filesystem errors.

If not fsck then what?

badblocks(8).

And I definitely will unmount relevant filesystem(s) before using badblocksâ¦

You donât necessarily have to. The default mode of badblocks is a non-invasive read-only test which is safe to run on a mounted filesystem.

That said, a read-only badblocks pass can give a false âno errorsâ report in cases where a non-destructive read-then-write pass (-n) will show errors.

Alternatively, a read-only pass may show an error that a read-then-write pass will silently bury by forcing the drive to relocate the bad sector.

In extreme cases, you could potentially fix a problem with a read-random-random-write pass (-n -t random -t random) because that will statistically flip all the bits at least twice, which may rub the driveâs nose in a bad sector, forcing a reallocation where a normal read-then-write pass (-n alone) may not.

Hard drives are weird. It is only through the grace of ECC and such that they approximate deterministic behavior as well as they do. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Jay Hart

11:49 p.m.

New subject: CentOS-6.8 fsck report Maximal Count

Talk about missing the email I wanted to reply too. Disregard...

...

...
On Mar 10, 2017, at 9:28 AM, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:

...
On Fri, March 10, 2017 9:52 am, Warren Young wrote:

...
On Mar 10, 2017, at 6:32 AM, James B. Byrne byrnejb@harte-lyne.ca wrote:

...
On Thu, March 9, 2017 09:46, John Hodrien wrote:

...
fsck's not good at finding disk errors, it finds filesystem errors.

If not fsck then what?

badblocks(8).

And I definitely will unmount relevant filesystem(s) before using badblocksâ¦

You donât necessarily have to. The default mode of badblocks is a non-invasive read-only test which is safe to run on a mounted filesystem.

That said, a read-only badblocks pass can give a false âno errorsâ report in cases where a non-destructive read-then-write pass (-n) will show errors.

Alternatively, a read-only pass may show an error that a read-then-write pass will silently bury by forcing the drive to relocate the bad sector.

In extreme cases, you could potentially fix a problem with a read-random-random-write pass (-n -t random -t random) because that will statistically flip all the bits at least twice, which may rub the driveâs nose in a bad sector, forcing a reallocation where a normal read-then-write pass (-n alone) may not.

Hard drives are weird. It is only through the grace of ECC and such that they approximate deterministic behavior as well as they do. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

m.roth＠5-cent.us

3:57 p.m.

New subject: CentOS-6.8 fsck report Maximal Count

James B. Byrne wrote:

...

On Thu, March 9, 2017 09:46, John Hodrien wrote:

...
On Thu, 9 Mar 2017, James B. Byrne wrote:

...
This indicated that a bad sector on the underlying disk system might be the source of the problem. The guests were all shutdown, a /forcefsck file was created on the host system, and the host system remotely restarted.

fsck's not good at finding disk errors, it finds filesystem errors.

If not fsck then what?

fsck run with -c, which forces badblocks to run. Or you can run that directly.

...

...
If it was a real disk issue, you'd expect matching errors in the host logs.

Yes, there are:

Mar 9 09:14:13 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063 Mar 9 09:14:30 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063 Mar 9 09:14:48 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063

Looks like only one sector's bad. Running badblocks should, I think, mark that sector as bad, so the system doesn't try to read or write there. I've got a user whose workstation has had a bad sector running for over a year. However, if it becomes two, or four, or 64 sectors, it's replacement time, asap. <snip> mark

3072

Age (days ago)

3072

Last active (days ago)

discuss@lists.centos.org

6 comments

5 participants

tags (0)

participants (5)

James B. Byrne
Jay Hart
m.roth＠5-cent.us
Valeri Galtsev
Warren Young