Reading S.M.A.R.T. disk data in ESXi
Beginning in vSphere 5.1 VMware provides the ability to look at S.M.A.R.T. data for HDD and SSD. As described in the VMware documentation an esxcli command is available to retrieve S.M.A.R.T. data from the disks. This can be useful to keep track of the health of your drives, perhaps including trying to estimate how much life is left in your flash drives - something about which I am occasionally asked.
Here is an example from a host in my lab.
First retrieve the list of devices. I have 4 different SSD drive models installed so I am listing the output for each one of them so that you can see how the output differs.
esxcli storage core device list
SSD1:
t10.ATA_____KINGSTON_SV300S37A120G__________________50026B773B02B8BF____
Display Name: Local ATA Disk (t10.ATA_____KINGSTON_SV300S37A120G__________________50026B773B02B8BF____)
Has Settable Display Name: true
Size: 114473
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/t10.ATA_____KINGSTON_SV300S37A120G__________________50026B773B02B8BF____
Vendor: ATA
Model: KINGSTON SV300S3
Revision: 506A
SCSI Level: 5
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: false
Is SSD: true
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: yes
Attached Filters:
VAAI Status: unknown
Other UIDs: vml.010000000035303032364237373342303242384246202020204b494e475354
Is Local SAS Device: false
Is Boot USB Device: false
No of outstanding IOs with competing worlds: 32
SSD2:
t10.ATA_____Samsung_SSD_840_EVO_120GB_______________S1D5NSBDB76769V_____
Display Name: Local ATA Disk (t10.ATA_____Samsung_SSD_840_EVO_120GB_______________S1D5NSBDB76769V_____)
Has Settable Display Name: true
Size: 114473
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/t10.ATA_____Samsung_SSD_840_EVO_120GB_______________S1D5NSBDB76769V_____
Vendor: ATA
Model: Samsung SSD 840
Revision: EXT0
SCSI Level: 5
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: false
Is SSD: true
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: yes
Attached Filters:
VAAI Status: unknown
Other UIDs: vml.0100000000533144354e53424442373637363956202020202053616d73756e
Is Local SAS Device: false
Is Boot USB Device: false
No of outstanding IOs with competing worlds: 32
SSD3:
t10.ATA_____M42DCT512M4SSD2__________________________000000001249091F58C6
Display Name: Local ATA Disk (t10.ATA_____M42DCT512M4SSD2__________________________000000001249091F58C6)
Has Settable Display Name: true
Size: 488386
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/t10.ATA_____M42DCT512M4SSD2__________________________000000001249091F58C6
Vendor: ATA
Model: M4-CT512M4SSD2
Revision: 040H
SCSI Level: 5
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: false
Is SSD: true
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: yes
Attached Filters:
VAAI Status: unknown
Other UIDs: vml.010000000030303030303030303132343930393146353843364d342d435435
Is Local SAS Device: false
Is Boot USB Device: false
No of outstanding IOs with competing worlds: 32
SSD4:
t10.ATA_____INTEL_SSDSC2BA100G3_____________________BTTV347502FS100FGN__
Display Name: Local ATA Disk (t10.ATA_____INTEL_SSDSC2BA100G3_____________________BTTV347502FS100FGN__)
Has Settable Display Name: true
Size: 95396
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/t10.ATA_____INTEL_SSDSC2BA100G3_____________________BTTV347502FS100FGN__
Vendor: ATA
Model: INTEL SSDSC2BA10
Revision: 5DV1
SCSI Level: 5
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: false
Is SSD: true
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: yes
Attached Filters:
VAAI Status: unknown
Other UIDs: vml.010000000042545456333437353032465331303046474e2020494e54454c20
Is Local SAS Device: false
Is Boot USB Device: false
No of outstanding IOs with competing worlds: 32
Next retrieve the S.M.A.R.T. information for the SSD drives.
SSD1:
~ # esxcli storage core device smart get -d=t10.ATA_____KINGSTON_SV300S37A120G__________________50026B773B02B8BF____
Parameter Value Threshold Worst
---------------------------- ----- --------- -----
Health Status OK N/A N/A
Media Wearout Indicator 0 0 0
Write Error Count N/A N/A N/A
Read Error Count 120 50 120
Power-on Hours 99 0 99
Power Cycle Count 100 0 100
Reallocated Sector Count 100 3 100
Raw Read Error Rate 120 50 120
Drive Temperature 27 0 72
Driver Rated Max Temperature N/A N/A N/A
Write Sectors TOT Count N/A N/A N/A
Read Sectors TOT Count N/A N/A N/A
Initial Bad Block Count N/A N/A N/A
~ #
SSD2:
~ # esxcli storage core device smart get -d=t10.ATA_____Samsung_SSD_840_EVO_120GB_______________S1D5NSBDB76769V_____
Parameter Value Threshold Worst
---------------------------- ----- --------- -----
Health Status OK N/A N/A
Media Wearout Indicator N/A N/A N/A
Write Error Count N/A N/A N/A
Read Error Count N/A N/A N/A
Power-on Hours 99 0 99
Power Cycle Count 99 0 99
Reallocated Sector Count 100 10 100
Raw Read Error Rate N/A N/A N/A
Drive Temperature N/A N/A N/A
Driver Rated Max Temperature 73 0 70
Write Sectors TOT Count 100 0 100
Read Sectors TOT Count N/A N/A N/A
Initial Bad Block Count N/A N/A N/A
~ #
SSD3:
~ # esxcli storage core device smart get -d=t10.ATA_____M42DCT512M4SSD2__________________________000000001249091F58C6
Parameter Value Threshold Worst
---------------------------- ----- --------- -----
Health Status OK N/A N/A
Media Wearout Indicator N/A N/A N/A
Write Error Count N/A N/A N/A
Read Error Count 100 50 100
Power-on Hours 100 1 100
Power Cycle Count 100 1 100
Reallocated Sector Count 100 10 100
Raw Read Error Rate 100 50 100
Drive Temperature 100 0 100
Driver Rated Max Temperature N/A N/A N/A
Write Sectors TOT Count 100 1 100
Read Sectors TOT Count N/A N/A N/A
Initial Bad Block Count 100 50 100
~ #
SSD4:
~ # esxcli storage core device smart get -d=t10.ATA_____INTEL_SSDSC2BA100G3_____________________BTTV347
502FS100FGN__
Parameter Value Threshold Worst
---------------------------- ----- --------- -----
Health Status OK N/A N/A
Media Wearout Indicator 100 0 100
Write Error Count N/A N/A N/A
Read Error Count N/A N/A N/A
Power-on Hours 100 0 100
Power Cycle Count 100 0 100
Reallocated Sector Count 100 0 100
Raw Read Error Rate N/A N/A N/A
Drive Temperature 100 0 100
Driver Rated Max Temperature 77 0 77
Write Sectors TOT Count 100 0 100
Read Sectors TOT Count N/A N/A N/A
Initial Bad Block Count 100 90 100
~ #
The reason why I show the output from the 4 different model SSD drives is to show you how monitoring flash media using S.M.A.R.T. information is not straight forward. Different manufacturers include varying output for their drive models due to the controller firmware on the devices, so in my case all 4 SSD disks have different output fields with data in them. Sometimes there actually isn't enough information to determine how much life may be left in a drive. This can pose a challenge to those of you hoping to keep an eye on this.
We should hope that S.M.A.R.T. output for flash devices will standardize over time so that all drives reflect the same information. I suspect that output for HDD is more standard than for newer SSD, though I have not validated this (no HDD in hosts). It is probably a good idea to periodically check for firmware updates for your drives as they could add functionality and expose additional information.
Reference the VMware KB for additional information.
March 10th, 2014 - 07:46
Great information, Peter. Thanks!
-Josh