Project

General

Profile

Bug #35065

Checksum Errors with certain SSD

Added by Pasqualino Casciano over 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
No priority
Assignee:
Alexander Motin
Category:
Hardware
Target version:
Seen in:
Severity:
Med High
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

FreeNAS-11.1
CPU Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz
RAM 2 x Crucial 16GB Single 2133MT/s DDR4 PC4-17000 Dual Ranked x8 ECC DIMM
HDD 4 x WD Red 2 TB 3.5’’
Power Supply Seasonic Platinum 400
Motherboard Supermicro X11SSM-F
Boot Disk SANDisk SSD Plus 2.5''
Case Fractal Define R5 Black (Midi Tower)

ChangeLog Required:
No
Tags:

Description

Three user have reported recently problems with SANDisk SSD Plus 2.5'' (http://a.co/0MMTCCH) and one user has reported similar problems with 3D Nand (http://a.co/4iN1l5b). These SSD have in common that they both use a Silicon Motion Controller (SM2246 respectively SM2256 or some variant of them).

https://forums.freenas.org/index.php?threads/boot-disk.63829/
https://forums.freenas.org/index.php?threads/freenas-installer-sandisk-ssd-checksum-errors.64049/#post-461659
https://forums.freenas.org/index.php?threads/transcend-ssd-boot-disk-zfs-checksum-errors.64321/#post-461500


Related issues

Has duplicate FreeNAS - Bug #39035: It does not support SanDisk SSDClosed

History

#1 Updated by Pasqualino Casciano over 3 years ago

  • Hardware Configuration updated (diff)

#2 Updated by Dru Lavigne over 3 years ago

  • Status changed from Unscreened to Blocked
  • Private changed from No to Yes
  • Reason for Blocked set to Need additional information from Author

Please have one of those users (if you're not one of them) upload a debug from an affected system as we'll need that to diagnose what is going on.

#3 Updated by Pasqualino Casciano over 3 years ago

I don't think that any of us has a system running with a faulty SSD as Boot device. The whole point is actually that the system is not usable with a corrupted SSD disk. I suggest that you buy a SANDisk SSD Plus 2.5'' an reproduce it directly. It happend to three of us so the likelyhood that it happens to you as well is extremely high. If you want to tackle this problem I don't think that there is any other way.

#4 Updated by Dru Lavigne over 3 years ago

  • Status changed from Blocked to Unscreened
  • Assignee changed from Release Council to Alexander Motin
  • Reason for Blocked deleted (Need additional information from Author)

Over to Sasha for his thoughts.

#5 Updated by Dru Lavigne over 3 years ago

  • Seen in changed from 11.0-U4 to 11.1-U5

#6 Updated by Pasqualino Casciano over 3 years ago

  • Hardware Configuration updated (diff)

#7 Updated by Pasqualino Casciano over 3 years ago

  • Hardware Configuration updated (diff)

Sorry, I just copy&pasted my current configuration in the Hardware Configuration Text-Box. The problem was actually observed with FreeNAS-11.1 (that's what's on the installation disk when you download it). Only later - when I found a working SSD - I updated from FreeNAS 11.1 to U5.

#8 Updated by Pasqualino Casciano over 3 years ago

  • Hardware Configuration updated (diff)

#9 Updated by Alexander Motin over 3 years ago

  • Status changed from Unscreened to Blocked
  • Reason for Blocked set to Waiting for feedback

Just a guess, there were some SSDs on the market with broken TRIM or NCQ TRIM implementations, corrupting the data. It would be useful to find out whether problem is reproducible with TRIM forcefully disabled with vfs.zfs.trim.enabled=0 loader tunable. If that help, I may need more detailed information about the SSDs (like `camcontrol identify /dev/ada0 -v`) to add them into exceptions list.

#10 Updated by Pasqualino Casciano over 3 years ago

  • Private changed from Yes to No

#11 Updated by Pasqualino Casciano over 3 years ago

  • Seen in changed from 11.1-U5 to 11.1

I have sent back both SSD that did not work. So, I cannot provide you with the requested information. I have however requested other users that still have or might have such and SSD to provide the necessary information.

#12 Updated by jon atkins over 3 years ago

Hi, I'm one of the original users hit with this issue, using a Transcend SSD.

As I have the boot SSD mirrored with an old HDD, the system remains stable, so I can run some tests.

I'll try the disabled-trim tests mentioned above when I get a chance, but here's the "camcontrol identify" output for now.

# camcontrol identify /dev/ada1 -v
camcontrol: sending ATA ATA_IDENTIFY with timeout of 30000 msecs
pass9: Raw identify data:
   0: 0040 3fff c837 0010 0000 0240 003f 0000
   8: 0000 0000 3339 3139 3734 4530 4431 3038
  16: 3733 3030 3030 3237 0000 ffff 0004 4e30
  24: 3131 3345 3120 5453 3136 4753 5344 3633
  32: 3020 2020 2020 2020 2020 2020 2020 2020
  40: 2020 2020 2020 2020 2020 2020 2020 8001
  48: 0000 0f00 4000 0200 0000 0007 3fff 0010
  56: 003f fc10 00fb 0101 40b0 01dd 0000 0007
  64: 0003 0078 0078 0078 0078 4000 0000 0000
  72: 0000 0000 0000 001f 0306 0000 0048 0040
  80: 03f0 0000 742b 7500 4020 7429 3400 4022
  88: 407f 0003 0000 0000 fffe 0000 0000 0000
  96: 0000 0000 0000 0000 40b0 01dd 0000 0000
 104: 0000 0100 0000 0000 0000 0000 0000 0000
 112: 0000 0000 0000 0000 0000 0000 0000 0000
 120: 0000 0000 0000 0000 0000 0000 0000 0000
 128: 0009 5452 414e 5343 454e 4400 0000 0000
 136: 0000 0000 0000 0000 0000 0000 0000 0000
 144: 0000 0000 0000 0000 0000 0000 0000 0000
 152: 0000 0000 0000 0000 0000 0000 0000 0000
 160: 0000 0000 0000 0000 0000 0000 0000 0000
 168: 0000 0001 0000 0000 0000 0000 0000 0000
 176: 0000 0000 0000 0000 0000 0000 0000 0000
 184: 0000 0000 0000 0000 0000 0000 0000 0000
 192: 0000 0000 0000 0000 0000 0000 0000 0000
 200: 0000 0000 0000 0000 0000 0000 0000 0000
 208: 0000 4000 0000 0000 0000 0000 0000 0000
 216: 0000 0001 0000 0000 0000 0000 1020 0000
 224: 0000 0000 0000 0000 0000 0000 0000 0000
 232: 0000 0000 0000 0000 0000 0000 0000 0000
 240: 0000 0000 0000 0000 0000 0000 0000 0000
 248: 0000 0000 0000 0000 0000 0000 0000 a8a5

camcontrol: sending ATA READ_NATIVE_MAX_ADDRESS48 with timeout of 1000 msecs
pass9: Raw native max data:
   0: 5000 af00 dd40 0140 0000 0000
error = 0x00, sector_count = 0x0000, device = 0x40, status = 0x50
pass9: <TS16GSSD630 N0113E1> ACS-2 ATA SATA 2.x device
pass9: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes)

protocol              ATA/ATAPI-9 SATA 2.x
device model          TS16GSSD630
firmware revision     N0113E1
serial number         391974E0D10873000027
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         31277232 sectors
LBA48 supported       31277232 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6
media RPM             non-rotating

Feature                      Support  Enabled   Value           Vendor
read ahead                     no       no
write cache                    yes      yes
flush cache                    yes      yes
overlap                        no
Tagged Command Queuing (TCQ)   no       no
Native Command Queuing (NCQ)   yes              32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
SMART                          yes      yes
microcode download             no       no
security                       yes      no
power management               yes      yes
advanced power management      no       no
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            no       no
write-read-verify              no       no
unload                         no       no
general purpose logging        yes      yes
free-fall                      no       no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              256
DSM - deterministic read       yes              any value
Host Protected Area (HPA)      yes      no      31277232/31277232
HPA - Security                 no

#13 Updated by anm nz over 3 years ago

I am another of the users who reported this issue on the FreeNAS forums.

Disabling TRIM by setting vfs.zfs.trim.enabled=0 appears to fix the problem.

I have just re-tested installing FreeNAS 11.1-U5 to a SanDisk "SSD Plus 120GB" SSD. With no extra tunables set, a post-install scrub of the "freenas-boot" pool produced >100 checksum errors, which is consistent with previous installs on this SSD. When I ran the installer with vfs.zfs.trim.enabled=0 set (by editing the boot configuration in GRUB) a post-install scrub produced no errors.

(Here is my original report in the FreeNAS forums which details the testing I previously did: https://forums.freenas.org/index.php?threads/freenas-installer-sandisk-ssd-checksum-errors.64049/)

Here is the output of camcontrol identify -v for this SSD:

# camcontrol identify /dev/ada2 -v
camcontrol: sending ATA ATA_IDENTIFY with timeout of 30000 msecs
pass6: Raw identify data:
   0: 0040 3fff c837 0010 0000 0000 003f 0000
   8: 0000 0000 3138 3039 3136 3830 3435 3536
  16: 2020 2020 2020 2020 0000 0000 0000 5545
  24: 3336 3030 524c 5361 6e44 6973 6b20 5353
  32: 4420 504c 5553 2031 3230 2047 4220 2020
  40: 2020 2020 2020 2020 2020 2020 2020 8001
  48: 4000 2f00 4000 0200 0000 0006 3fff 0010
  56: 003f fc10 00fb 9101 8000 0df9 0000 0007
  64: 0003 0078 0078 0078 0078 5e00 0000 0000
  72: 0000 0000 0000 001f 850e 0004 0148 0040
  80: 03f0 0110 346b 7d09 4123 3469 bc01 4123
  88: 407f 0001 0000 0000 fffe 0000 0000 0000
  96: 0000 0000 0000 0000 8000 0df9 0000 0000
 104: 0000 0008 4000 0000 5001 b448 b6e8 594e
 112: 0000 0000 0000 0000 0000 0000 0000 411c
 120: 401c 0000 0000 0000 0000 0000 0000 0000
 128: 0009 0000 0000 0000 0000 0000 0000 0000
 136: 0000 0000 0000 0000 0000 0000 0000 0000
 144: 0000 0000 0000 0000 0000 0000 0000 0000
 152: 0000 0000 0000 0000 0000 0000 0000 0000
 160: 0000 0000 0000 0000 0000 0000 0000 0000
 168: 0003 0001 0000 0000 0000 0000 0000 0000
 176: 2020 2020 2020 2020 2020 2020 2020 2020
 184: 2020 2020 2020 2020 2020 2020 2020 2020
 192: 2020 2020 2020 2020 2020 2020 2020 2020
 200: 2020 2020 2020 2020 2020 2020 0000 0000
 208: 0000 4000 0000 0000 0000 0000 0000 0000
 216: 0000 0001 0000 0000 0000 0000 10ff 0000
 224: 0000 0000 0000 0000 0000 0000 0000 0000
 232: 0000 0000 0010 0010 0000 0000 0000 0000
 240: 0000 0000 0000 0000 0000 0000 0000 0000
 248: 0000 0000 0000 0000 0000 0000 0000 f9a5

camcontrol: sending ATA READ_NATIVE_MAX_ADDRESS48 with timeout of 1000 msecs
pass6: Raw native max data:
   0: 5000 ff00 f97f 0d40 0000 0000
error = 0x00, sector_count = 0x0000, device = 0x40, status = 0x50
pass6: <SanDisk SSD PLUS 120 GB UE3600RL> ACS-2 ATA SATA 3.x device
pass6: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes)

protocol              ATA/ATAPI-9 SATA 3.x
device model          SanDisk SSD PLUS 120 GB
firmware revision     UE3600RL
serial number         180916804556
WWN                   5001b448b6e8594e
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         234455040 sectors
LBA48 supported       234455040 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6
media RPM             non-rotating

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
overlap                        no
Tagged Command Queuing (TCQ)   no       no
Native Command Queuing (NCQ)   yes              32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      yes      no      0/0x00
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            no       no
write-read-verify              no       no
unload                         no       no
general purpose logging        yes      yes
free-fall                      no       no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              8
DSM - deterministic read       yes              any value
Host Protected Area (HPA)      yes      no      234455040/234455040
HPA - Security                 no
#

#14 Updated by Eric Loewenthal over 3 years ago

FWIW, I installed vanilla FreeBSD 11.1 on one of these SSDs recently and have no problems to report. However, mine has firmware UE3000RL. Rest of output is identical, apart from Serial/IDs.

#15 Updated by jon atkins over 3 years ago

jon atkins wrote:

I'll try the disabled-trim tests mentioned above when I get a chance,

Finally had a chance to try this - and yes, turning off trim has fixed the issue on my Transcend SSD.

#16 Updated by Dru Lavigne over 3 years ago

  • Status changed from Blocked to Unscreened
  • Reason for Blocked deleted (Waiting for feedback)

#17 Updated by Alexander Motin over 3 years ago

  • Status changed from Unscreened to Screened
  • Severity changed from New to Med High

Looking on above identify data, none of the devices support NCQ TRIM, which was my original thought. Regular non-NCQ TRIM is not a new feature, so I would expect it to be more working. I've got one SanDisk SSD PLUS 120 GB with exactly the same UE3600RL firmware as above. I'll try to reproduce the issue as time permit (no idea when). Until that any more statistics are welcome what other SSDs are made on that crappy platform.

#18 Updated by rui silva over 3 years ago

I can confirm that with firmware UE4500RL the situation is still the same but the setting "vfs.zfs.trim.enabled=0" did not fix the issue for me on FreeNAS-9.10.2-U6 (561f0d7a1).

#19 Updated by Alexander Motin over 3 years ago

  • Has duplicate Bug #39035: It does not support SanDisk SSD added

#20 Updated by Alexander Motin over 3 years ago

  • Has duplicate deleted (Bug #39035: It does not support SanDisk SSD)

#21 Updated by Alexander Motin over 3 years ago

  • Has duplicate Bug #39035: It does not support SanDisk SSD added

#22 Updated by Jeff Chen over 3 years ago

I tried to use two WD Green 120GB SSD as my boot drive and had the same error. As soon as the installation finished, data started corrupting. Interestingly, the corruption doesn't happen when I use these drives as L2ARC (yet).

Here are the details of my drive:

camcontrol: sending ATA ATA_IDENTIFY with timeout of 30000 msecs
pass6: Raw identify data:
   0: 0040 3fff c837 0010 0000 0000 003f 0000 
   8: 0000 0000 3138 3038 4143 3830 3032 3034 
  16: 2020 2020 2020 2020 0000 0000 0000 5545 
  24: 3336 3030 3030 5744 4320 5744 5331 3230 
  32: 4732 4730 412d 3030 4a48 3330 2020 2020 
  40: 2020 2020 2020 2020 2020 2020 2020 8001 
  48: 4000 2f00 4000 0200 0000 0006 3fff 0010 
  56: 003f fc10 00fb 9101 8000 0df9 0000 0007 
  64: 0003 0078 0078 0078 0078 5e00 0000 0000 
  72: 0000 0000 0000 001f 850e 0006 0148 0040 
  80: 03f0 0110 346b 7d09 4123 3469 bc01 4123 
  88: 407f 0001 0000 0000 fffe 0000 0000 0000 
  96: 0000 0000 0000 0000 8000 0df9 0000 0000 
 104: 0000 0008 4000 0000 5001 b448 b6e6 add4 
 112: 0000 0000 0000 0000 0000 0000 0000 411c 
 120: 401c 0000 0000 0000 0000 0000 0000 0000 
 128: 0001 0000 0000 0000 0000 0000 0000 0000 
 136: 0000 0000 0000 0000 0000 0000 0000 0000 
 144: 0000 0000 0000 0000 0000 0000 0000 0000 
 152: 0000 0000 0000 0000 0000 0000 0000 0000 
 160: 0000 0000 0000 0000 0000 0000 0000 0000 
 168: 0003 0001 0000 0000 0000 0000 0000 0000 
 176: 2020 2020 2020 2020 2020 2020 2020 2020 
 184: 2020 2020 2020 2020 2020 2020 2020 2020 
 192: 2020 2020 2020 2020 2020 2020 2020 2020 
 200: 2020 2020 2020 2020 2020 2020 0000 0000 
 208: 0000 4000 0000 0000 0000 0000 0000 0000 
 216: 0000 0001 0000 0000 0000 0000 10ff 0000 
 224: 0000 0000 0000 0000 0000 0000 0000 0000 
 232: 0000 0000 0010 0010 0000 0000 0000 0000 
 240: 0000 0000 0000 0000 0000 0000 0000 0000 
 248: 0000 0000 0000 0000 0000 0000 0000 7ca5 

camcontrol: sending ATA READ_NATIVE_MAX_ADDRESS48 with timeout of 1000 msecs
pass6: Raw native max data:
   0: 5000 ff00 f97f 0d40 0000 0000 
error = 0x00, sector_count = 0x0000, device = 0x40, status = 0x50
pass6: <WDC WDS120G2G0A-00JH30 UE360000> ACS-2 ATA SATA 3.x device
pass6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)

protocol              ATA/ATAPI-9 SATA 3.x
device model          WDC WDS120G2G0A-00JH30
firmware revision     UE360000
serial number         1808AC800204
WWN                   5001b448b6e6add4
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         234455040 sectors
LBA48 supported       234455040 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6 
media RPM             non-rotating

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes    yes
write cache                    yes    yes
flush cache                    yes    yes
overlap                        no
Tagged Command Queuing (TCQ)   no    no
Native Command Queuing (NCQ)   yes        32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
SMART                          yes    yes
microcode download             yes    yes
security                       yes    no
power management               yes    yes
advanced power management      yes    no    0/0x00
automatic acoustic management  no    no
media status notification      no    no
power-up in Standby            no    no
write-read-verify              no    no
unload                         no    no
general purpose logging        yes    yes
free-fall                      no    no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              8
DSM - deterministic read       yes              any value
Host Protected Area (HPA)      yes      no      234455040/234455040
HPA - Security                 no

I'm going to try the loader option to disable TRIM and report back the result.

#23 Updated by Jeff Chen about 3 years ago

I can confirm that the loader tunable vfs.zfs.trim.enabled=0 worked for me. Been running the machine for a couple of days and never any problem again.

#24 Updated by Jason Poff about 3 years ago

  • Seen in changed from 11.1 to 11.2-BETA1
freenas# camcontrol identify /dev/ada5 -v
camcontrol: sending ATA ATA_IDENTIFY with timeout of 30000 msecs
pass13: Raw identify data:
   0: 0040 3fff c837 0010 0000 0000 003f 0000
   8: 0000 0000 3138 3131 3032 3830 3439 3233
  16: 2020 2020 2020 2020 0000 0000 0000 5545
  24: 3336 3030 524c 5361 6e44 6973 6b20 5353
  32: 4420 504c 5553 2031 3230 2047 4220 2020
  40: 2020 2020 2020 2020 2020 2020 2020 8001
  48: 4000 2f00 4000 0200 0000 0006 3fff 0010
  56: 003f fc10 00fb 9101 8000 0df9 0000 0007
  64: 0003 0078 0078 0078 0078 5e00 0000 0000
  72: 0000 0000 0000 001f 850e 0006 0148 0040
  80: 03f0 0110 346b 7d09 4123 3469 bc01 4123
  88: 407f 0001 0000 0000 fffe 0000 0000 0000
  96: 0000 0000 0000 0000 8000 0df9 0000 0000
 104: 0000 0008 4000 0000 5001 b448 b6e3 cef9
 112: 0000 0000 0000 0000 0000 0000 0000 411c
 120: 401c 0000 0000 0000 0000 0000 0000 0000
 128: 0009 0000 0000 0000 0000 0000 0000 0000
 136: 0000 0000 0000 0000 0000 0000 0000 0000
 144: 0000 0000 0000 0000 0000 0000 0000 0000
 152: 0000 0000 0000 0000 0000 0000 0000 0000
 160: 0000 0000 0000 0000 0000 0000 0000 0000
 168: 0003 0001 0000 0000 0000 0000 0000 0000
 176: 2020 2020 2020 2020 2020 2020 2020 2020
 184: 2020 2020 2020 2020 2020 2020 2020 2020
 192: 2020 2020 2020 2020 2020 2020 2020 2020
 200: 2020 2020 2020 2020 2020 2020 0000 0000
 208: 0000 4000 0000 0000 0000 0000 0000 0000
 216: 0000 0001 0000 0000 0000 0000 10ff 0000
 224: 0000 0000 0000 0000 0000 0000 0000 0000
 232: 0000 0000 0010 0010 0000 0000 0000 0000
 240: 0000 0000 0000 0000 0000 0000 0000 0000
 248: 0000 0000 0000 0000 0000 0000 0000 eaa5

camcontrol: sending ATA READ_NATIVE_MAX_ADDRESS48 with timeout of 1000 msecs
pass13: Raw native max data:
   0: 5000 ff00 f97f 0d40 0000 0000
error = 0x00, sector_count = 0x0000, device = 0x40, status = 0x50
pass13: <SanDisk SSD PLUS 120 GB UE3600RL> ACS-2 ATA SATA 3.x device
pass13: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)

protocol              ATA/ATAPI-9 SATA 3.x
device model          SanDisk SSD PLUS 120 GB
firmware revision     UE3600RL
serial number         181102804923
WWN                   5001b448b6e3cef9
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         234455040 sectors
LBA48 supported       234455040 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6
media RPM             non-rotating
Zoned-Device Commands no

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
overlap                        no
Tagged Command Queuing (TCQ)   no       no
Native Command Queuing (NCQ)   yes              32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      yes      no      0/0x00
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            no       no
write-read-verify              no       no
unload                         no       no
general purpose logging        yes      yes
free-fall                      no       no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              8
DSM - deterministic read       yes              any value
Host Protected Area (HPA)      yes      no      234455040/234455040
HPA - Security                 no

I have the same error with 11.2-BETA1.
The Issue only affects the boot drives. No other zfs pool has this error.
I have 6 of these SSDs installed. 2 as mirrored boot drives and 4 as a separate pool for VMs.
The VM Pool works fine, the boot pool is throwing checksum errors but continues to work (for now).

The loader tunable vfs.zfs.trim.enabled=0 did not work for me.

I did not try reinstalling with this setting.

#25 Updated by Alexander Motin about 3 years ago

Jason Poff wrote:

The loader tunable vfs.zfs.trim.enabled=0 did not work for me.

I did not try reinstalling with this setting.

If corruption already happened, setting the tunable won't help retroactively.

#26 Updated by Jason Poff about 3 years ago

Alexander Motin wrote:

Jason Poff wrote:

The loader tunable vfs.zfs.trim.enabled=0 did not work for me.

I did not try reinstalling with this setting.

If corruption already happened, setting the tunable won't help retroactively.

Corruption that has already occured, can't be fixed.
This is correct, but shouldn't the checksum counter stay at 0 after that?

I will also try a reinstall with this parameter at some point.

#27 Updated by Alexander Motin about 3 years ago

Jason Poff wrote:

This is correct, but shouldn't the checksum counter stay at 0 after that?

Why? If data are corrupted, then each time corrupted block(s) is accesses, counter will increase. You may try `zpool status -v` to find and delete the corrupted file, and run scrub after that to make sure nothing else corrupted has left.

#28 Updated by Chris McDowell almost 3 years ago

  • Seen in changed from 11.2-BETA1 to 11.2-RC1

I may have a similar issue with an HP SSD running a marvel controller. Let me know if there is anything else I can get that may be helpful.

https://www.newegg.com/Product/Product.aspx?Item=N82E16820326780&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-FlexOffers.com%2c+LLC-_-na-_-na-_-na&AID=12079868&PID=8192570&SID=1187656FOF28012635862403984&utm_medium=affiliates&utm_source=afc-FlexOffers.com%2c+LLC&cjevent=896bba22dd5f11e8826800580a240612

sudo zpool status -v freenas-boot 
  pool: freenas-boot
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: resilvered 785M in 0 days 00:00:21 with 7 errors on Sun Oct 28 19:22:15 2018
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  DEGRADED     0     0     7
      mirror-0  DEGRADED     0     0    14
        ada1p2  DEGRADED     0     0 1.06K  too many errors
        ada0p2  ONLINE       0     0    14  block size: 512B configured, 4096B native

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x25>
        <metadata>:<0x26>
        <metadata>:<0x27>
sudo smartctl -a /dev/ada1
Password:
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     HP SSD S600 120GB
Serial Number:    HBSA18291701597
LU WWN Device Id: 5 02b2a2 01d1c1b1a
Add. Product Id:  mavlsata
Firmware Version: HC0719C1
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct 31 15:52:14 2018 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x51) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0002)    Does not save SMART data before
                    entering power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (   5) minutes.

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002e   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   069   100   000    Old_age   Always       -       72
 12 Power_Cycle_Count       0x0032   001   100   000    Old_age   Always       -       21
171 Unknown_Attribute       0x0032   100   100   010    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   010    Old_age   Always       -       0
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       15
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   037   100   000    Old_age   Always       -       37
198 Offline_Uncorrectable   0x0032   100   100   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0030   100   100   000    Old_age   Offline      -       0
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       132
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       8

SMART Error Log not supported

SMART Self-test log structure revision number 1

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

% sudo camcontrol identify /dev/ada1
Password:
pass1: <HP SSD S600 120GB HC0719C1> ACS-3 ATA SATA 3.x device
pass1: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 512bytes)

protocol              ATA/ATAPI-10 SATA 3.x
device model          HP SSD S600 120GB
firmware revision     HC0719C1
serial number         HBSA18291701597
WWN                   502b2a201d1c1b1a
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         234441648 sectors
LBA48 supported       234441648 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6 
media RPM             non-rotating
Zoned-Device Commands no

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes    yes
write cache                    yes    yes
flush cache                    yes    yes
overlap                        no
Tagged Command Queuing (TCQ)   no    no
Native Command Queuing (NCQ)   yes        32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
SMART                          yes    yes
microcode download             yes    yes
security                       yes    no
power management               yes    yes
advanced power management      yes    no    254/0xFE
automatic acoustic management  no    no
media status notification      no    no
power-up in Standby            no    no
write-read-verify              no    no
unload                         no    no
general purpose logging        yes    yes
free-fall                      no    no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              8
DSM - deterministic read       yes              zeroed
Host Protected Area (HPA)      no

#29 Updated by Alexander Motin almost 3 years ago

Interesting times we are living -- price of USB stick for SSD. ;)

Have you tried to disable TRIM? Does it help as with another SSD? Just in case, could you add raw identify data from `camcontrol identify /dev/ada1`?

#30 Updated by Eric Loewenthal almost 3 years ago

Alexander, I have one each of the WD Green and the SanDisk models with the offending controller, but an older firmware version that seems to work fine. Would access to either of them be useful to you?

#31 Updated by Chris McDowell almost 3 years ago

Disabling TRIM seemed to have removed all of the errors.

sudo camcontrol identify /dev/ada1 -v
camcontrol: sending ATA ATA_IDENTIFY with timeout of 30000 msecs
pass1: Raw identify data:
   0: 045a 3fff c837 0010 0000 0000 003f 0000 
   8: 0000 0000 4842 5341 3138 3239 3137 3031 
  16: 3539 3720 2020 2020 0000 0000 0000 4843 
  24: 3037 3139 4331 4850 2053 5344 2053 3630 
  32: 3020 3132 3047 4220 2020 2020 2020 2020 
  40: 2020 2020 2020 2020 2020 2020 2020 8001 
  48: 4000 2f00 4000 0000 0000 0007 3fff 0010 
  56: 003f fc10 00fb 0101 4bb0 0df9 0000 0007 
  64: 0003 0078 0078 0078 0078 4c20 0000 0000 
  72: 0000 0000 0000 001f e30e 0084 0144 0040 
  80: 07f8 011b 706b 7409 4163 7069 b401 4163 
  88: 207f 0005 0005 00fe fffe 0000 0000 0000 
  96: 0000 0000 0000 0000 4bb0 0df9 0000 0000 
 104: 0000 0008 4000 0000 502b 2a20 1d1c 1b1a 
 112: 0000 0000 0000 0000 0000 0000 0000 4018 
 120: 4018 0000 0000 0000 0000 0000 0000 0000 
 128: 0029 0000 0000 0000 0000 0000 0000 0000 
 136: 0000 0000 0000 0000 0000 0000 0000 0000 
 144: 0000 0000 0000 0000 0000 0000 0000 0000 
 152: 0000 0000 0000 0000 0000 0000 0000 0000 
 160: 0000 0000 0000 0000 0000 0000 0000 0000 
 168: 0000 0001 6d61 766c 7361 7461 0000 0000 
 176: 4d4d 3332 6731 364b 3443 4531 4d41 4254 
 184: 3532 3030 2020 2020 2020 2020 2020 2020 
 192: 2020 2020 2020 2020 2020 2020 2020 2020 
 200: 2020 2020 2020 2020 2020 2020 0000 0000 
 208: 0000 4000 0000 0000 0000 0000 0000 0000 
 216: 0000 0001 0000 0000 0000 0000 10ff 0000 
 224: 0000 0000 0000 0000 0000 0000 0000 0000 
 232: 0000 0000 0008 0400 0000 0000 0000 0000 
 240: 0000 0000 0000 0000 0000 0000 0000 0000 
 248: 0000 0000 0000 0000 0000 0000 0000 b5a5 

pass1: <HP SSD S600 120GB HC0719C1> ACS-3 ATA SATA 3.x device
pass1: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 512bytes)

protocol              ATA/ATAPI-10 SATA 3.x
device model          HP SSD S600 120GB
firmware revision     HC0719C1
serial number         HBSA18291701597
WWN                   502b2a201d1c1b1a
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         234441648 sectors
LBA48 supported       234441648 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6 
media RPM             non-rotating
Zoned-Device Commands no

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes    yes
write cache                    yes    yes
flush cache                    yes    yes
overlap                        no
Tagged Command Queuing (TCQ)   no    no
Native Command Queuing (NCQ)   yes        32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
SMART                          yes    yes
microcode download             yes    yes
security                       yes    no
power management               yes    yes
advanced power management      yes    no    254/0xFE
automatic acoustic management  no    no
media status notification      no    no
power-up in Standby            no    no
write-read-verify              no    no
unload                         no    no
general purpose logging        yes    yes
free-fall                      no    no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              8
DSM - deterministic read       yes              zeroed
Host Protected Area (HPA)      no

#32 Updated by Eric Loewenthal almost 3 years ago

Scratch that, my WD unit with older firmware suffers from this as well, despite the older firmware:

root@backup:~ # camcontrol identify /dev/ada3 -v | less
camcontrol: sending ATA ATA_IDENTIFY with timeout of 30000 msecs
camcontrol: sending ATA READ_NATIVE_MAX_ADDRESS48 with timeout of 1000 msecs
pass6: Raw identify data:
   0: 0040 3fff c837 0010 0000 0000 003f 0000
   8: 0000 0000 3137 3436 3431 3435 3337 3135
  16: 2020 2020 2020 2020 0000 0000 0000 5545
  24: 3330 3030 3030 5744 4320 5744 5331 3230
  32: 4732 4730 412d 3030 4a48 3330 2020 2020
  40: 2020 2020 2020 2020 2020 2020 2020 8001
  48: 4000 2f00 4000 0200 0000 0006 3fff 0010
  56: 003f fc10 00fb 9101 8000 0df9 0000 0007
  64: 0003 0078 0078 0078 0078 5e00 0000 0000
  72: 0000 0000 0000 001f 850e 0006 0148 0040
  80: 03f0 0110 346b 7d09 4123 3469 bc01 4123
  88: 407f 0001 0000 0000 fffe 0000 0000 0000
  96: 0000 0000 0000 0000 8000 0df9 0000 0000
 104: 0000 0008 4000 0000 5001 b444 a9e5 85cf
 112: 0000 0000 0000 0000 0000 0000 0000 411c
 120: 401c 0000 0000 0000 0000 0000 0000 0000
 128: 0009 0000 0000 0000 0000 0000 0000 0000
 136: 0000 0000 0000 0000 0000 0000 0000 0000
 144: 0000 0000 0000 0000 0000 0000 0000 0000
 152: 0000 0000 0000 0000 0000 0000 0000 0000
 160: 0000 0000 0000 0000 0000 0000 0000 0000
 168: 0003 0001 0000 0000 0000 0000 0000 0000
 176: 2020 2020 2020 2020 2020 2020 2020 2020
 184: 2020 2020 2020 2020 2020 2020 2020 2020
 192: 2020 2020 2020 2020 2020 2020 2020 2020
 200: 2020 2020 2020 2020 2020 2020 0000 0000
 208: 0000 4000 0000 0000 0000 0000 0000 0000
 216: 0000 0001 0000 0000 0000 0000 10ff 0000
 224: 0000 0000 0000 0000 0000 0000 0000 0000
 232: 0000 0000 0010 0010 0000 0000 0000 0000
 240: 0000 0000 0000 0000 0000 0000 0000 0000
 248: 0000 0000 0000 0000 0000 0000 0000 cca5

pass6: Raw native max data:
   0: 5000 ff00 f97f 0d40 0000 0000
error = 0x00, sector_count = 0x0000, device = 0x40, status = 0x50
pass6: <WDC WDS120G2G0A-00JH30 UE300000> ACS-2 ATA SATA 3.x device
pass6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)

protocol              ATA/ATAPI-9 SATA 3.x
device model          WDC WDS120G2G0A-00JH30
firmware revision     UE300000
serial number         174641453715
WWN                   5001b444a9e585cf
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         234455040 sectors
LBA48 supported       234455040 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6
media RPM             non-rotating
Zoned-Device Commands no

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
overlap                        no
Tagged Command Queuing (TCQ)   no       no
Native Command Queuing (NCQ)   yes              32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      yes      no      0/0x00
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            no       no
write-read-verify              no       no
unload                         no       no
general purpose logging        yes      yes
free-fall                      no       no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              8
DSM - deterministic read       yes              any value
Host Protected Area (HPA)      yes      no      234455040/234455040
HPA - Security                 no

#33 Avatar?id=14398&size=24x24 Updated by Kris Moore over 2 years ago

  • Status changed from Screened to Closed

Also available in: Atom PDF