[CentOS] Nvme m.2 disk problem

Alessandro Baggi

alessandro.baggi at gmail.com
Sun Feb 24 10:08:52 UTC 2019


Hi list,
I'm running Centos 7.6 on an Corsair Force MP500 120 GB. Root fs is ext4 
and this drive is ~1 year old.
System works very well except on boot.
During boot process I got always a file system check on nvme drive.

Running smartctl on this drive I got this:


=== START OF SMART DATA SECTION === 
 
 

SMART overall-health self-assessment test result: PASSED 
 
 

 
 
 

SMART/Health Information (NVMe Log 0x02, NSID 0x1) 
 
 

Critical Warning:                   0x00 
 
 

Temperature:                        41 Celsius 
 
 

Available Spare:                    100% 
 
 

Available Spare Threshold:          1% 
 
 

Percentage Used:                    1% 
 
 

Data Units Read:                    5,355,595 [2,74 TB] 
 
 

Data Units Written:                 5,826,517 [2,98 TB] 
 
 

Host Read Commands:                 67,978,550 
 
 

Host Write Commands:                75,422,898 
 
 

Controller Busy Time:               32,863 
 
 

Power Cycles:                       811 
 
 

Power On Hours:                     2,813
Unsafe Shutdowns:                   317
Media and Data Integrity Errors:    0
Error Information Log Entries:      177
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 2:               77 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
   0        177     0  0x0014  0x4004      - 8796109799680     1     -
   1        176     0  0x0019  0x4004      - 8796109799680     1     -
   2        175     0  0x001a  0x4004      - 8796109799680     1     -
   3        174     0  0x0005  0x4004      - 8796109799680     1     -
   4        173     0  0x000c  0x4004      - 8796109799680     1     -
   5        172     0  0x0019  0x4004      - 8796109799680     1     -
   6        171     0  0x001d  0x4004      - 8796109799680     1     -
   7        170     0  0x0014  0x4004      - 8796109799680     1     -
   8        169     0  0x0011  0x4004      - 8796109799680     1     -
   9        168     0  0x000f  0x4004      - 8796109799680     1     -
  10        167     0  0x0000  0x4004      - 8796109799680     1     -
  11        166     0  0x0006  0x4004      - 8796109799680     1     -
  12        165     0  0x0008  0x4004      - 8796109799680     1     -
  13        164     0  0x000e  0x4004      - 8796109799680     1     -
  14        163     0  0x0008  0x4004      - 8796109799680     1     -
  15        162     0  0x0006  0x4004      - 8796109799680     1     -
... (48 entries not shown)


I noticed that Unsafe shutdowns increased rapidly and I don't know why 
there is an unsafe shutdown. Every 3/4 boot this value is increased by 1 
and I don't know why.

I can't find any errors on system logs.

Can someone point me in the right direction?

Thanks in advance.

Alessandro.


More information about the CentOS mailing list