Hi list, I'm running Centos 7.6 on an Corsair Force MP500 120 GB. Root fs is ext4 and this drive is ~1 year old. System works very well except on boot. During boot process I got always a file system check on nvme drive.
Running smartctl on this drive I got this:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning: 0x00
Temperature: 41 Celsius
Available Spare: 100%
Available Spare Threshold: 1%
Percentage Used: 1%
Data Units Read: 5,355,595 [2,74 TB]
Data Units Written: 5,826,517 [2,98 TB]
Host Read Commands: 67,978,550
Host Write Commands: 75,422,898
Controller Busy Time: 32,863
Power Cycles: 811
Power On Hours: 2,813 Unsafe Shutdowns: 317 Media and Data Integrity Errors: 0 Error Information Log Entries: 177 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 2: 77 Celsius
Error Information (NVMe Log 0x01, max 64 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 177 0 0x0014 0x4004 - 8796109799680 1 - 1 176 0 0x0019 0x4004 - 8796109799680 1 - 2 175 0 0x001a 0x4004 - 8796109799680 1 - 3 174 0 0x0005 0x4004 - 8796109799680 1 - 4 173 0 0x000c 0x4004 - 8796109799680 1 - 5 172 0 0x0019 0x4004 - 8796109799680 1 - 6 171 0 0x001d 0x4004 - 8796109799680 1 - 7 170 0 0x0014 0x4004 - 8796109799680 1 - 8 169 0 0x0011 0x4004 - 8796109799680 1 - 9 168 0 0x000f 0x4004 - 8796109799680 1 - 10 167 0 0x0000 0x4004 - 8796109799680 1 - 11 166 0 0x0006 0x4004 - 8796109799680 1 - 12 165 0 0x0008 0x4004 - 8796109799680 1 - 13 164 0 0x000e 0x4004 - 8796109799680 1 - 14 163 0 0x0008 0x4004 - 8796109799680 1 - 15 162 0 0x0006 0x4004 - 8796109799680 1 - ... (48 entries not shown)
I noticed that Unsafe shutdowns increased rapidly and I don't know why there is an unsafe shutdown. Every 3/4 boot this value is increased by 1 and I don't know why.
I can't find any errors on system logs.
Can someone point me in the right direction?
Thanks in advance.
Alessandro.