Project

General

Profile

Bug #28179

ZFS: vdev state changed

Added by Zexi Jia over 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Nice to have
Assignee:
Alexander Motin
Category:
OS
Target version:
Seen in:
Severity:
Low
Reason for Closing:
Behaves as Intended
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Just checked log and found this:

Jan 7 16:00:00 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=10447084274758135511
Jan 7 16:00:00 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=11891976999399183130
Jan 7 16:00:00 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=2876660660992119741
Jan 7 16:00:00 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=6364139156996985005
Jan 7 16:00:00 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=5637529313712236470
Jan 7 16:00:00 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=17158548358165520315
Jan 7 16:00:00 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=17891434106328835447
Jan 7 16:00:00 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=867352022490844202
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=9259722659257589245
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=16966930689155008760
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=15848735979903780
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=16077184943779007548
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=8976844742250401345
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=11722335458295827035
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=14985074553984242318
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=16730478333914911226
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=6299058262429394074
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=3722024655907603494
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=18226598805198546619
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=12183499103290706230
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=8712823253835328000
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=9435379788174783464
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=5596408751049691044
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=13114319525729580529
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=7356715463652158197
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=13966987097671769262
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=15733103913979731440
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=16011212124988965411
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=2862432200385390058
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=12689811649520269831
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=6319193295834873095
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=3781490103348964071
Jan 7 16:00:01 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=5096106245591970840
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=5799519801456929288
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=744777000986915119
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=4287849620163328804
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=9354007552848258102
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=15717403668451740787
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=2598147963083234650
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=5071696816669929405
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=801057650515597734
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=15338491673167574369
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=5657372290930152595
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=9694350767572514790
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=11845435352447684946
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=7461875116100524746
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=8520807491796766840
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=3355030106931695562
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=2241660429221754945
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=2491544380249934756
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=3650619445696093620
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=1176679567978896239
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=10373720145800567722
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=3838391818551922788
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=15731836723281972804
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=7439014899415579367
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=10084775435525385216
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=3705522358206802581
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=6625819943389211130
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=8532291863411749444
Jan 7 16:00:02 freenas ZFS: vdev state changed, pool_guid=6977200919396029605 vdev_guid=8443878122674703823

Jan 8 19:45:00 freenas ZFS: vdev state changed, pool_guid=6832667243645497093 vdev_guid=17975649069347884308
Jan 8 22:27:33 freenas ses0: da2,pass2: Element descriptor: 'SLOT 01, '
Jan 8 22:27:33 freenas ses0: da2,pass2: SAS Device Slot Element: 1 Phys at Slot 1, Not All Phys
Jan 8 22:27:33 freenas ses0: phy 0: SAS device type 1 id 0
Jan 8 22:27:33 freenas ses0: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Jan 8 22:27:33 freenas ses0: phy 0: parent 5000ccab0300f0bf addr 5000cca25208bf65
Jan 8 22:28:33 freenas ses0: da2,pass2: Element descriptor: 'SLOT 01,7SG4U48G '
Jan 8 22:28:33 freenas ses0: da2,pass2: SAS Device Slot Element: 1 Phys at Slot 1, Not All Phys
Jan 8 22:28:33 freenas ses0: phy 0: SAS device type 1 id 0
Jan 8 22:28:33 freenas ses0: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Jan 8 22:28:33 freenas ses0: phy 0: parent 5000ccab0300f0bf addr 5000cca25208bf65

Jan 23 09:02:12 freenas ses0: da5,pass5: Element descriptor: 'SLOT 04, '
Jan 23 09:02:12 freenas ses0: da5,pass5: SAS Device Slot Element: 1 Phys at Slot 4, Not All Phys
Jan 23 09:02:12 freenas ses0: phy 0: SAS device type 1 id 0
Jan 23 09:02:12 freenas ses0: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Jan 23 09:02:12 freenas ses0: phy 0: parent 5000ccab0300f0bf addr 5000cca2520a54e9
Jan 23 09:03:12 freenas ses0: da5,pass5: Element descriptor: 'SLOT 04,7SG5P4GG '
Jan 23 09:03:12 freenas ses0: da5,pass5: SAS Device Slot Element: 1 Phys at Slot 4, Not All Phys
Jan 23 09:03:12 freenas ses0: phy 0: SAS device type 1 id 0
Jan 23 09:03:12 freenas ses0: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Jan 23 09:03:12 freenas ses0: phy 0: parent 5000ccab0300f0bf addr 5000cca2520a54e9

Jan 24 19:45:00 freenas ZFS: vdev state changed, pool_guid=6832667243645497093 vdev_guid=17975649069347884308

Jan 30 11:21:45 freenas ses1: da104,pass105: Element descriptor: 'SLOT 43, '
Jan 30 11:21:45 freenas ses1: da104,pass105: SAS Device Slot Element: 1 Phys at Slot 43, Not All Phys
Jan 30 11:21:45 freenas ses1: phy 0: SAS device type 1 id 1
Jan 30 11:21:45 freenas ses1: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Jan 30 11:21:45 freenas ses1: phy 0: parent 5000ccab0300f0e5 addr 5000cca2520a952a
Jan 30 11:22:45 freenas ses1: da104,pass105: Element descriptor: 'SLOT 43,7SG5UE3G '
Jan 30 11:22:45 freenas ses1: da104,pass105: SAS Device Slot Element: 1 Phys at Slot 43, Not All Phys
Jan 30 11:22:45 freenas ses1: phy 0: SAS device type 1 id 1
Jan 30 11:22:45 freenas ses1: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Jan 30 11:22:45 freenas ses1: phy 0: parent 5000ccab0300f0e5 addr 5000cca2520a952a

Jan 31 16:16:39 freenas ses1: da89,pass90: Element descriptor: 'SLOT 28, '
Jan 31 16:16:39 freenas ses1: da89,pass90: SAS Device Slot Element: 1 Phys at Slot 28, Not All Phys
Jan 31 16:16:39 freenas ses1: phy 0: SAS device type 1 id 1
Jan 31 16:16:39 freenas ses1: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Jan 31 16:16:39 freenas ses1: phy 0: parent 5000ccab0300f0e3 addr 5000cca2520a0f6a
Jan 31 16:17:39 freenas ses1: da89,pass90: Element descriptor: 'SLOT 28,7SG5JHZG '
Jan 31 16:17:39 freenas ses1: da89,pass90: SAS Device Slot Element: 1 Phys at Slot 28, Not All Phys
Jan 31 16:17:39 freenas ses1: phy 0: SAS device type 1 id 1
Jan 31 16:17:39 freenas ses1: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Jan 31 16:17:39 freenas ses1: phy 0: parent 5000ccab0300f0e3 addr 5000cca2520a0f6a

Feb 1 01:29:56 freenas ses1: da80,pass81: Element descriptor: 'SLOT 19, '
Feb 1 01:29:56 freenas ses1: da80,pass81: SAS Device Slot Element: 1 Phys at Slot 19, Not All Phys
Feb 1 01:29:56 freenas ses1: phy 0: SAS device type 1 id 1
Feb 1 01:29:56 freenas ses1: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Feb 1 01:29:56 freenas ses1: phy 0: parent 5000ccab0300f0e5 addr 5000cca2520a3db6
Feb 1 01:30:56 freenas ses1: da80,pass81: Element descriptor: 'SLOT 19,7SG5MLKG '
Feb 1 01:30:56 freenas ses1: da80,pass81: SAS Device Slot Element: 1 Phys at Slot 19, Not All Phys
Feb 1 01:30:56 freenas ses1: phy 0: SAS device type 1 id 1
Feb 1 01:30:56 freenas ses1: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Feb 1 01:30:56 freenas ses1: phy 0: parent 5000ccab0300f0e5 addr 5000cca2520a3db6

The machine is a 2U server with an 60-disk sas JBOD attached. Multipath is used under default configuration.
I've searched this problem and someone had wrote this:
notify 10 {
match "system" "ZFS";
match "type" "resource.fs.zfs.statechange";
action "logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=$pool_guid vdev_guid=$vdev_guid'";
};
} else if (event.Value("class") "resource.fs.zfs.statechange") {
RefreshVdevState();
/*
  • If this vdev is DEGRADED, FAULTED, or UNAVAIL, try to
  • activate a hotspare. Otherwise, ignore the event
    */
    if (VdevState() VDEV_STATE_FAULTED ||
    VdevState() VDEV_STATE_DEGRADED ||
    VdevState() VDEV_STATE_CANT_OPEN)
    (void) ActivateSpare();
    consumed = true;
    }

The system have been on for about 2 months, but it seems only occured recently.
So is this notice really concerned with a disk's degrade or other things?
We have other same platforms running 9.10.2U2/U4/U6 11U4 but I haven't seen the same notice yet.

Debug files is attached.

Thanks!

History

#1 Updated by Dru Lavigne over 2 years ago

  • Assignee changed from Release Council to Alexander Motin
  • Target version set to 11.2-RC2
  • Seen in changed from 11.0-U2 to 11.0-U4

#2 Updated by Dru Lavigne over 2 years ago

  • Private changed from No to Yes

#3 Updated by Alexander Motin over 2 years ago

  • Priority changed from No priority to Nice to have
  • Severity set to Low

I don't think those events mean degrading disks, since 1) all disks of your pool can not fails same time, and 2) you would notice that in other ways too. I may need to investigate that area deeper and may be make the messages more informative, but so far I don't feel a threat there.

#4 Updated by Zexi Jia over 2 years ago

Thanks a lot!
That's really a relief. I've encountered multipath interruption on ANOTHER machine caused by sas expander, the log performed like this:
Sep 13 15:55:04 storage-backup mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68)
Sep 13 15:59:18 storage-backup (da94:mpr0:0:170:0): WRITE. CDB: 2a 00 9d 86 55 28 00 00 08 00 length 4096 SMID 253 terminated ioc 804b scsi 0 state c xfer 8192
Sep 13 15:59:18 storage-backup (da94:mpr0:0:170:0): WRITE. CDB: 2a 00 9d 86 55 28 00 00 08 00
Sep 13 15:59:18 storage-backup (da94:mpr0:0:170:0): CAM status: CCB request completed with an error
Sep 13 15:59:18 storage-backup (da94:mpr0:0:170:0): Retrying command
Sep 13 15:59:20 storage-backup mpr0: mprsas_prepare_remove: Sending reset for target ID 105
Sep 13 15:59:20 storage-backup mpr0: da34 at mpr0 bus 0 scbus1 target 105 lun 0
Sep 13 15:59:20 storage-backup mprsas_prepare_remove: Sending reset for target ID 170
Sep 13 15:59:20 storage-backup da34: <HGST HUH721008AL5200 A21D> s/n 7SGB7X1C detached
Sep 13 15:59:20 storage-backup GEOM_MULTIPATH: da34 in disk35 was disconnected
Sep 13 15:59:20 storage-backup da94 at mpr0 bus 0 scbus1 target 170 lun 0
Sep 13 15:59:20 storage-backup GEOM_MULTIPATH: da34 removed from disk35
Sep 13 15:59:20 storage-backup da94: <HGST HUH721008AL5200 A21D>mpr0: s/n 7SGB7X1Cclearing target 105 handle 0x0063
Sep 13 15:59:20 storage-backup detached
Sep 13 15:59:20 storage-backup mpr0: At enclosure level 0, slot 34, connector name ( )
Sep 13 15:59:20 storage-backup (da34:mpr0: mpr0:0:Unfreezing devq for target ID 105
Sep 13 15:59:20 storage-backup 105:0): Periph destroyed
Sep 13 15:59:22 storage-backup (da94:mpr0:0:170:0): SYNCHRONIZE CACHE. CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 515 terminated ioc 804b scsi 0 state c xfer 0
Sep 13 15:59:22 storage-backup (da94:mpr0:0:170:0): WRITE. CDB: 2a 00 9d 86 55 28 00 00 08 00 length 4096 SMID 221 terminated ioc 804b scsi 0 state c xf(da94:mpr0:0:170:0): SYNCHRONIZE CACHE. CDB: 35 00 00 00 00 00 00 00 00 00
Sep 13 15:59:22 storage-backup er 0
Sep 13 15:59:22 storage-backup (da94:mpr0:0:170:0): CAM status: CCB request completed with an error
Sep 13 15:59:22 storage-backup (da94:mpr0:0:170:0): Error 5, Periph was invalidated
Sep 13 15:59:22 storage-backup (da94:mpr0:0:170:0): WRITE. CDB: 2a 00 9d 86 55 28 00 00 08 00
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: Error 5, da94 in disk35 marked FAIL
Sep 13 15:59:22 storage-backup (da94:mpr0:0:170:0): CAM status: CCB request completed with an error
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: all paths in disk35 were marked FAIL, restore da94
Sep 13 15:59:22 storage-backup (da94:GEOM_MULTIPATH: da94 is now active path in disk35
Sep 13 15:59:22 storage-backup mpr0:0:GEOM_MULTIPATH: Error 6, da94 in disk35 marked FAIL
Sep 13 15:59:22 storage-backup 170:GEOM_MULTIPATH: all paths in disk35 were marked FAIL, restore da94
Sep 13 15:59:22 storage-backup 0): GEOM_MULTIPATH: da94 is now active path in disk35
Sep 13 15:59:22 storage-backup Error 5, Periph was invalidated
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: Error 5, da94 in disk35 marked FAIL
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: all paths in disk35 were marked FAIL, restore da94
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: da94 is now active path in disk35
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: Error 6, da94 in disk35 marked FAIL
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: all paths in disk35 were marked FAIL, restore da94
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: da94 is now active path in disk35
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: Error 6, da94 in disk35 marked FAIL
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: all paths in disk35 were marked FAIL, restore da94
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: da94 is now active path in disk35
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: Error 6, da94 in disk35 marked FAIL
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: all paths in disk35 were marked FAIL, restore da94
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: da94 is now active path in disk35
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: Error 6, da94 in disk35 marked FAIL
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: all paths in disk35 were marked FAIL, restore da94
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: da94 is now active path in disk35
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: da94 in disk35 was disconnected
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: out of providers for disk35
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: da94 removed from disk35
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: destroying disk35
Sep 13 15:59:22 storage-backup GEOM_ELI: Device multipath/disk35p1.eli destroyed.
Sep 13 15:59:22 storage-backup GEOM_ELI: Detached multipath/disk35p1.eli on last close.
Sep 13 15:59:22 storage-backup (da94:mpr0:0:170:0): Periph destroyed
Sep 13 15:59:22 storage-backup GEOM_MULTIPATH: disk35 destroyed
Sep 13 15:59:24 storage-backup mpr0: clearing target 170 handle 0x006b
Sep 13 15:59:24 storage-backup mpr0: At enclosure level 0, slot 34, connector name ( )
Sep 13 15:59:24 storage-backup mpr0: Unfreezing devq for target ID 170
Sep 13 15:59:39 storage-backup mpr0: SAS Address from SAS device page0 = 5000cca2520a1f6a
Sep 13 15:59:39 storage-backup mpr0: Found device <401<SspTarg>,End Device> <12.0Gbps> handle<0x0089> enclosureHandle<0x0002> slot 34
Sep 13 15:59:39 storage-backup mpr0: At enclosure level 0 and connector name ( )
Sep 13 15:59:39 storage-backup mpr0: SAS Address from SAS device page0 = 5000cca2520a1f69
Sep 13 15:59:39 storage-backup mpr0: Found device <401<SspTarg>,End Device> <12.0Gbps> handle<0x008a> enclosureHandle<0x0003> slot 34
Sep 13 15:59:39 storage-backup mpr0: At enclosure level 0 and connector name ( )
Sep 13 15:59:39 storage-backup ses1: da34,pass37: Element descriptor: 'SLOT 34,7SGB7X1C '
Sep 13 15:59:39 storage-backup ses2: da94,pass98: Element descriptor: 'SLOT 34,7SGB7X1C '
Sep 13 15:59:39 storage-backup ses1: da34,pass37: SAS Device Slot Element: 1 Phys at Slot 34, Not All Phys
Sep 13 15:59:39 storage-backup ses2: da94,pass98: SAS Device Slot Element: 1 Phys at Slot 34, Not All Phys
Sep 13 15:59:39 storage-backup ses1: phy 0: SAS device type 1 id 1
Sep 13 15:59:39 storage-backup ses2: phy 0: SAS device type 1 id 0
Sep 13 15:59:39 storage-backup ses1: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Sep 13 15:59:39 storage-backup ses2: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Sep 13 15:59:39 storage-backup ses1: phy 0: parent 5000ccab0300b265 addr 5000cca2520a1f6a
Sep 13 15:59:39 storage-backup ses2: phy 0: parent 5000ccab0300b25f addr 5000cca2520a1f69
Sep 13 15:59:59 storage-backup da34 at mpr0 bus 0 scbus1 target 105 lun 0
Sep 13 15:59:59 storage-backup da34: <HGST HUH721008AL5200 A21D> Fixed Direct Access SPC-4 SCSI device
Sep 13 15:59:59 storage-backup da34: Serial Number 7SG5KL0G
Sep 13 15:59:59 storage-backup da34: 1200.000MB/s transfers
Sep 13 15:59:59 storage-backup da34: Command Queueing enabled
Sep 13 15:59:59 storage-backup da34: 7630885MB (15628053168 512 byte sectors)
Sep 13 15:59:59 storage-backup da94 at mpr0 bus 0 scbus1 target 170 lun 0
Sep 13 15:59:59 storage-backup da94: <HGST HUH721008AL5200 A21D> Fixed Direct Access SPC-4 SCSI device
Sep 13 15:59:59 storage-backup da94: Serial Number 7SG5KL0G
Sep 13 15:59:59 storage-backup da94: 1200.000MB/s transfers
Sep 13 15:59:59 storage-backup da94: Command Queueing enabled
Sep 13 15:59:59 storage-backup da94: 7630885MB (15628053168 512 byte sectors)
Sep 13 16:00:23 storage-backup GEOM_MULTIPATH: disk35 created
Sep 13 16:00:23 storage-backup GEOM_MULTIPATH: da34 added to disk35
Sep 13 16:00:23 storage-backup GEOM_MULTIPATH: da34 is now active path in disk35
Sep 13 16:00:23 storage-backup GEOM_MULTIPATH: da94 added to disk35
Sep 13 16:00:39 storage-backup ses2: da94,pass98: Element descriptor: 'SLOT 34,7SG5KL0G '
Sep 13 16:00:39 storage-backup ses1: da34,pass37: Element descriptor: 'SLOT 34,7SG5KL0G '
Sep 13 16:00:39 storage-backup ses2: da94,pass98: SAS Device Slot Element: 1 Phys at Slot 34, Not All Phys
Sep 13 16:00:39 storage-backup ses1: da34,pass37: SAS Device Slot Element: 1 Phys at Slot 34, Not All Phys
Sep 13 16:00:39 storage-backup ses2: phy 0: SAS device type 1 id 0
Sep 13 16:00:39 storage-backup ses1: phy 0: SAS device type 1 id 1
Sep 13 16:00:39 storage-backup ses2: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Sep 13 16:00:39 storage-backup ses1: phy 0: protocols: Initiator( None ) Target( SMP SSP )
Sep 13 16:00:39 storage-backup ses2: phy 0: parent 5000ccab0300b25f addr 5000cca2520a1f69
Sep 13 16:00:39 storage-backup ses1: phy 0: parent 5000ccab0300b265 addr 5000cca2520a1f6a

And this time I dont'get a fail report from multipath and no message about da* errors. so I may assume there's no problems with physical disks or SXP. And I have more confidence about ZFS than hardware.

Thanks.

#5 Updated by Zexi Jia over 2 years ago

  • File debug-JiaSTO-20180318055739.tgz added

Hi
Just now another machine has encountered the same problem, no configurations changes and the system load have been up to 50+ at that time:
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=15916363622754148863
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=15412947657976667112
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=9049173211244087018
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=8639148120030284829
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=12977767965927659248
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=12092380517578844290
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=2074244910678080119
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=12714240639617674111
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=16697704984302108144
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=2734324193141105127
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=803827587812379835
Mar 18 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=4194144212717326851
Mar 18 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=702906459739410488
Mar 18 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=12051376423194843549
Mar 18 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=16602607746315412890
Mar 18 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=1633069320562312391
Mar 18 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=6436805218171306803
Mar 18 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=2963697052862021578
Mar 18 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=7855256879009318443
Mar 18 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=6490641168057957318
Mar 18 00:00:03 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=7846078208211889302
Mar 18 00:00:03 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=927498551708852621
Mar 18 00:00:03 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=6914017684629092673
Mar 18 00:00:03 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=1987570433530014607

These disks incluedes 8 sata HDDs directly attached via 9211-8i HBA, 16 sata HDDs directly attached via 9201-16i
So I think there should be something to do on ZFS?
Debug file is attached.
Thanks!
Jia

#6 Updated by Zexi Jia over 2 years ago

BTW: I've just update the system to 11.1-U2 8days ago.

#7 Updated by Zexi Jia over 2 years ago

  • File debug-SMBackup-20180318060705.tgz added

Another machine. Same time and same problem system usage up to 10, also 11.1-U2 and uptime 4 days:
Mar 18 00:00:00 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=3778695407310745711
Mar 18 00:00:00 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=2774220957278229907
Mar 18 00:00:00 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=57811906109708138
Mar 18 00:00:00 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=10791378402053794287
Mar 18 00:00:00 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=8215844516903649204
Mar 18 00:00:00 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=14924620330208964412
Mar 18 00:00:00 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=3931656151363697459
Mar 18 00:00:00 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=4440519434372250859
Mar 18 00:00:00 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=2082870155629986027
Mar 18 00:00:00 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=14071528566431857493
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=1830698812771558732
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=3922608304235888432
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=12075836545108449158
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=2761919051496294119
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=17590959018855100556
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=17723178535141173701
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=15258857582041824851
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=1627604443948138372
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=11973038269928496749
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=3245804393570048632
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=10656240761842127044
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=6484498283154482291
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=12281566621851802112
Mar 18 00:00:01 SMBackup ZFS: vdev state changed, pool_guid=3439783460947168780 vdev_guid=10894863908781956208

Debug file is attached.
However, please be advised, I have't found the same problem on the third machine up 5 days with 11.1U2, I'll be continue to monitor the status and report further problems.
Regards.
Jia

#8 Updated by Dru Lavigne about 2 years ago

  • Target version changed from 11.2-RC2 to 11.2-BETA3

#9 Updated by Dru Lavigne about 2 years ago

  • Status changed from Not Started to Blocked
  • Reason for Blocked set to Need additional information from Author

Zexi: is this still an issue with U5?

#10 Updated by Zexi Jia about 2 years ago

  • File debug-JiaSTO-20180718105535.tgz added

Ah there is, still some problems, I've only update one machine to the latest U5, the recent report is 3days ago.
There may be a greater chance that, the error occurs on 00:00:00 of a day. Is this concerned?
debug file is attached.
Thanks a lot for your concern and effort!
Jul 14 03:45:00 JiaSTO ZFS: vdev state changed, pool_guid=3413871055182836737 vdev_guid=9331993170923214583
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=15916363622754148863
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=15412947657976667112
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=9049173211244087018
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=8639148120030284829
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=12977767965927659248
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=12092380517578844290
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=2074244910678080119
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=12714240639617674111
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=16697704984302108144
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=2734324193141105127
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=803827587812379835
Jul 15 00:00:01 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=4194144212717326851
Jul 15 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=702906459739410488
Jul 15 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=12051376423194843549
Jul 15 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=16602607746315412890
Jul 15 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=1633069320562312391
Jul 15 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=6436805218171306803
Jul 15 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=2963697052862021578
Jul 15 00:00:02 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=7855256879009318443
Jul 15 00:00:03 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=6490641168057957318
Jul 15 00:00:03 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=7846078208211889302
Jul 15 00:00:03 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=927498551708852621
Jul 15 00:00:03 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=6914017684629092673
Jul 15 00:00:03 JiaSTO ZFS: vdev state changed, pool_guid=5158671704541409915 vdev_guid=1987570433530014607

#11 Updated by Dru Lavigne about 2 years ago

  • Status changed from Blocked to Unscreened
  • Seen in changed from 11.0-U4 to 11.1-U5
  • Reason for Blocked deleted (Need additional information from Author)

#13 Updated by Alexander Motin about 2 years ago

  • Category changed from Hardware to OS
  • Status changed from Unscreened to Closed
  • Target version changed from 11.2-BETA3 to N/A
  • Reason for Closing set to Behaves as Intended
  • Needs QA changed from Yes to No
  • Needs Doc changed from Yes to No
  • Needs Merging changed from Yes to No

After some investigation it seems like those messages for all disks of the pool are triggered by scrub. You may see scrub command in pool logs close to those messages in system logs. Also I got such stack trace on my tests system:

zfs_post_state_change
              kernel`vdev_open+0x5c8
              kernel`vdev_open_children+0x30
              kernel`vdev_mirror_open+0x34
              kernel`vdev_open+0x1e7
              kernel`vdev_open_children+0x30
              kernel`vdev_root_open+0x3a
              kernel`vdev_open+0x1e7
              kernel`vdev_reopen+0x5d
              kernel`dsl_scan+0x38
              kernel`zfs_ioc_pool_scan+0x86
              kernel`zfsdev_ioctl+0x78b
              kernel`devfs_ioctl+0xcb
              kernel`VOP_IOCTL_APV+0xd9
              kernel`vn_ioctl+0x124
              kernel`devfs_ioctl_f+0x1f
              kernel`kern_ioctl+0x2b9
              kernel`sys_ioctl+0x15e
              kernel`amd64_syscall+0x281
              kernel`0xffffffff811e9c3d

Before ZFS starts scrub process it "Purge all vdev caches and probe all devices". Devices which are alive are not even closed, only update parameters, but from the point of management it is still counted as a state change, announcement of which you see in logs.

#14 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug-freenas-20180202204239.tgz)

#15 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug-JiaSTO-20180318055739.tgz)

#16 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug-SMBackup-20180318060705.tgz)

#17 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug-JiaSTO-20180718105535.tgz)

#18 Updated by Dru Lavigne about 2 years ago

  • Private changed from Yes to No

Also available in: Atom PDF