Project

General

Profile

Bug #28249

Spontaneous reboot after upgrade from 9.10.2-U6 to 11.1-U1

Added by Graham Bird over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Important
Assignee:
Alexander Motin
Category:
Hardware
Target version:
Seen in:
Severity:
Low Medium
Reason for Closing:
Reason for Blocked:
Waiting for feedback
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

Dell R710T, 48Gb ECC Memory, 6x8Tb Seagate / WD Red. Bott from USB internal drive.

ChangeLog Required:
No

Description

After the above upgrade - before which the Dell R710 server had run well - we are getting periodic spontaneous reboots. The reboots do not appear to be related to a specific activity or circumstance. We DID NOT upgrade the pool flags yet.

Debug file attached.

History

#1 Updated by Dru Lavigne over 2 years ago

  • Private changed from No to Yes

#2 Updated by Dru Lavigne over 2 years ago

  • Assignee changed from Release Council to Alexander Motin
  • Reason for Blocked set to Need verification

Alexander: does anything stick out in the logs for you?

#3 Updated by Alexander Motin over 2 years ago

  • Status changed from Not Started to Blocked
  • Reason for Blocked changed from Need verification to Need additional information

Unfortunately I see nothing about the crashes in debug data. Crash dumps there are from ancient 9.3 version. We'd need at leats some input about what is actually going on there. If you can somehow trigger or predict that -- picture of the console would be good to have.

#4 Updated by Dru Lavigne over 2 years ago

  • Category set to Hardware
  • Target version set to 11.2-RC2

#5 Updated by Graham Bird over 2 years ago

Here is the console log that I submitted with the original query to the forum:

Feb 5 10:28:03 Guardian kernel: bce0: link state changed to UP
Feb 5 10:28:03 Guardian kernel: bce0: link state changed to UP
Feb 5 10:28:03 Guardian Gigabit link up!
Feb 5 10:28:03 Guardian bce0: Gigabit link up!
Feb 5 10:28:03 Guardian bce1:
Feb 5 10:28:03 Guardian kernel: bce1: link state changed to UP
Feb 5 10:28:03 Guardian kernel: bce1: link state changed to UP
Feb 5 10:28:03 Guardian Gigabit link up!
Feb 5 10:28:03 Guardian bce1: Gigabit link up!
Feb 5 10:28:03 Guardian bce2:
Feb 5 10:28:03 Guardian kernel: bce2: link state changed to UP
Feb 5 10:28:03 Guardian kernel: bce2: link state changed to UP
Feb 5 10:28:03 Guardian Gigabit link up!
Feb 5 10:28:03 Guardian bce2: Gigabit link up!
Feb 5 10:28:03 Guardian bce3:
Feb 5 10:28:03 Guardian kernel: bce3: link state changed to UP
Feb 5 10:28:03 Guardian kernel: bce3: link state changed to UP
Feb 5 10:28:03 Guardian Gigabit link up!
Feb 5 10:28:03 Guardian bce3: Gigabit link up!
Feb 5 10:28:03 Guardian ums0 on uhub1
Feb 5 10:28:03 Guardian ums0: <Mouse> on usbus3
Feb 5 10:28:03 Guardian ums0: 3 buttons and [Z] coordinates ID=0
Feb 5 10:28:05 Guardian ntpd2426: ntpd 4.2.8p10-a (1): Starting
Feb 5 10:28:12 Guardian root: /etc/rc: WARNING: failed precmd routine for vmware_guestd
Feb 5 10:28:17 Guardian root: /etc/rc: WARNING: failed precmd routine for minio
Feb 5 10:28:21 Guardian smartd3128: Unable to register device /dev/da6 (no Directive -d removable). Exiting.
Feb 5 10:28:23 Guardian daemon3436: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Feb 5 10:28:23 Guardian daemon3436: bootstrap = true: do not enable unless necessary
Feb 5 10:28:23 Guardian daemon3436: > Starting Consul agent...
Feb 5 10:28:23 Guardian daemon[3436]: > Consul agent running!
Feb 5 10:28:23 Guardian daemon3436: Version: 'v1.0.0'
Feb 5 10:28:23 Guardian daemon3436: Node ID: '5fa7252c-ee3c-b586-7b85-6a9e1119c18a'
Feb 5 10:28:23 Guardian daemon3436: Node name: 'Guardian.local'
Feb 5 10:28:23 Guardian daemon3436: Datacenter: 'dc1' (Segment: '<all>')
Feb 5 10:28:23 Guardian daemon3436: Server: true (Bootstrap: true)
Feb 5 10:28:23 Guardian daemon3436: Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600)
Feb 5 10:28:23 Guardian daemon3436: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
Feb 5 10:28:23 Guardian daemon3436: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
Feb 5 10:28:23 Guardian daemon3436:
Feb 5 10:28:23 Guardian daemon3436: ==> Log data will now stream in as it occurs:
Feb 5 10:28:23 Guardian daemon3436:
Feb 5 10:28:23 Guardian daemon3436: 2018/02/05 10:28:23 [WARN] agent: Check 'freenas_health' is now warning
Feb 5 10:28:30 Guardian daemon3436: 2018/02/05 10:28:30 [ERR] agent: failed to sync remote state: No cluster leader
Feb 5 10:28:30 Guardian daemon3436: 2018/02/05 10:28:30 [WARN] raft: Heartbeat timeout from "" reached, starting election
Feb 5 18:28:45 Guardian /middlewared216: dnssd_clientstub DNSServiceProcessResult called with invalid DNSServiceRef 0x81f61bda0 FFFFFFFF DDDDDDDD
(SNIP Health wanings)
Feb 5 12:01:18 Guardian rsync: ssh: connect to host ebbettspass.duckdns.org port 22: Operation timed out
Feb 5 12:01:18 Guardian rsync: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
Feb 5 12:01:18 Guardian rsync: rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
Feb 5 12:02:46 Guardian daemon3436: 2018/02/05 12:02:46 [WARN] agent: Check 'freenas_health' is now warning
(SNIP)
Feb 5 12:36:56 Guardian daemon3436: 2018/02/05 12:36:56 [WARN] agent: Check 'freenas_health' is now warning
Feb 5 12:46:17 Guardian syslog-ng2021: syslog-ng starting up; version='3.7.3'
Feb 5 12:46:17 Guardian Copyright (c) 1992-2017 The FreeBSD Project.
Feb 5 12:46:17 Guardian Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Feb 5 12:46:17 Guardian The Regents of the University of California. All rights reserved.
Feb 5 12:46:17 Guardian FreeBSD is a registered trademark of The FreeBSD Foundation.
Feb 5 12:46:17 Guardian FreeBSD 11.1-STABLE #0 r321665+4bd3ee42941(freenas/11.1-stable): Thu Jan 18 15:45:01 UTC 2018
Feb 5 12:46:17 Guardian root@gauntlet:/freenas-11-releng/freenas/_BE/objs/freenas-11-releng/freenas/_BE/os/sys/FreeNAS.amd64 amd64
Feb 5 12:46:17 Guardian FreeBSD clang version 5.0.0 (tags/RELEASE_500/final 312559) (based on LLVM 5.0.0svn)
Feb 5 12:46:17 Guardian CPU: Intel(R) Xeon(R) CPU E5504 @ 2.00GHz (1995.04-MHz K8-class CPU)
Feb 5 12:46:17 Guardian Origin="GenuineIntel" Id=0x106a5 Family=0x6 Model=0x1a Stepping=5
Feb 5 12:46:17 Guardian Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Feb 5 12:46:17 Guardian Features2=0x9ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT>
Feb 5 12:46:17 Guardian AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
Feb 5 12:46:17 Guardian AMD Features2=0x1<LAHF>
Feb 5 12:46:17 Guardian VT-x: (disabled in BIOS) PAT,HLT,MTF,PAUSE,EPT,VPID
Feb 5 12:46:17 Guardian TSC: P-state invariant, performance statistics
Feb 5 12:46:17 Guardian real memory = 53687091200 (51200 MB)
Feb 5 12:46:17 Guardian avail memory = 49951158272 (47637 MB)
Feb 5 12:46:17 Guardian Event timer "LAPIC" quality 100
Feb 5 12:46:17 Guardian ACPI APIC Table: <DELL PE_SC3 >
Feb 5 12:46:17 Guardian FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
Feb 5 12:46:17 Guardian FreeBSD/SMP: 2 package(s) x 4 core(s)
Feb 5 12:46:17 Guardian WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.
Feb 5 12:46:17 Guardian ioapic1: Changing APIC ID to 1
Feb 5 12:46:17 Guardian ioapic0 <Version 2.0> irqs 0-23 on motherboard
Feb 5 12:46:17 Guardian ioapic1 <Version 2.0> irqs 32-55 on motherboard
Feb 5 12:46:17 Guardian SMP: AP CPU #2 Launched!
Feb 5 12:46:17 Guardian SMP: AP CPU #4 Launched!
Feb 5 12:46:17 Guardian SMP: AP CPU #7 Launched!
Feb 5 12:46:17 Guardian SMP: AP CPU #6 Launched!
Feb 5 12:46:17 Guardian SMP: AP CPU #5 Launched!
Feb 5 12:46:17 Guardian SMP: AP CPU #1 Launched!
Feb 5 12:46:17 Guardian SMP: AP CPU #3 Launched!
Feb 5 12:46:17 Guardian Timecounter "TSC" frequency 1995044240 Hz quality 1000
Feb 5 12:46:17 Guardian random: entropy device external interface
Feb 5 12:46:17 Guardian kbd1 at kbdmux0
Feb 5 12:46:17 Guardian nexus0
Feb 5 12:46:17 Guardian cryptosoft0: <software crypto> on motherboard
Feb 5 12:46:17 Guardian aesni0: No AESNI support.
Feb 5 12:46:17 Guardian padlock0: No ACE support.
Feb 5 12:46:17 Guardian acpi0: <DELL PE_SC3> on motherboard
Feb 5 12:46:17 Guardian acpi0: Power Button (fixed)
Feb 5 12:46:17 Guardian cpu0: <ACPI CPU> on acpi0
Feb 5 12:46:17 Guardian cpu1: <ACPI CPU> on acpi0
Feb 5 12:46:17 Guardian cpu2: <ACPI CPU> on acpi0
Feb 5 12:46:17 Guardian cpu3: <ACPI CPU> on acpi0
Feb 5 12:46:17 Guardian cpu4: <ACPI CPU> on acpi0
Feb 5 12:46:17 Guardian cpu5: <ACPI CPU> on acpi0
Feb 5 12:46:17 Guardian cpu6: <ACPI CPU> on acpi0
Feb 5 12:46:17 Guardian cpu7: <ACPI CPU> on acpi0
Feb 5 12:46:17 Guardian atrtc0: <AT realtime clock> port 0x70-0x7f irq 8 on acpi0
Feb 5 12:46:17 Guardian atrtc0: registered as a time-of-day clock, resolution 1.000000s
Feb 5 12:46:17 Guardian Event timer "RTC" frequency 32768 Hz quality 0
Feb 5 12:46:17 Guardian attimer0: <AT timer> port 0x40-0x5f irq 0 on acpi0
Feb 5 12:46:17 Guardian Timecounter "i8254" frequency 1193182 Hz quality 0
Feb 5 12:46:17 Guardian Event timer "i8254" frequency 1193182 Hz quality 100
Feb 5 12:46:17 Guardian hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Feb 5 12:46:17 Guardian Timecounter "HPET" frequency 14318180 Hz quality 950
Feb 5 12:46:17 Guardian Event timer "HPET" frequency 14318180 Hz quality 350
Feb 5 12:46:17 Guardian Event timer "HPET1" frequency 14318180 Hz quality 340
Feb 5 12:46:17 Guardian Event timer "HPET2" frequency 14318180 Hz quality 340
Feb 5 12:46:17 Guardian Event timer "HPET3" frequency 14318180 Hz quality 340
Feb 5 12:46:17 Guardian Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
Feb 5 12:46:17 Guardian acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
Feb 5 12:46:17 Guardian pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
Feb 5 12:46:17 Guardian pcib0: _OSC returned error 0x10
Feb 5 12:46:17 Guardian pci0: <ACPI PCI bus> on pcib0
Feb 5 12:46:17 Guardian pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
Feb 5 12:46:17 Guardian pci1: <ACPI PCI bus> on pcib1
Feb 5 12:46:17 Guardian bce0: <QLogic NetXtreme II BCM5709 1000Base-T (C0)> mem 0xd6000000-0xd7ffffff irq 36 at device 0.0 on pci1
Feb 5 12:46:17 Guardian miibus0: <MII bus> on bce0
Feb 5 12:46:17 Guardian brgphy0: <BCM5709 10/100/1000baseT PHY> PHY 1 on miibus0
Feb 5 12:46:17 Guardian brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
Feb 5 12:46:17 Guardian bce0: Using defaults for TSO: 65518/35/2048
Feb 5 12:46:17 Guardian bce0: Ethernet address: 00:24:e8:5f:05:99
Feb 5 12:46:17 Guardian bce0:
Feb 5 12:46:17 Guardian kernel: bce0: link state changed to DOWN
Feb 5 12:46:17 Guardian kernel: bce0: link state changed to DOWN
Feb 5 12:46:17 Guardian ASIC (0x57092003); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (4.6.4); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 1.0.6)
Feb 5 12:46:17 Guardian Coal (RX:6,6,18,18; TX:20,20,80,80)
Feb 5 12:46:17 Guardian bce1: <QLogic NetXtreme II BCM5709 1000Base-T (C0)> mem 0xd8000000-0xd9ffffff irq 48 at device 0.1 on pci1
Feb 5 12:46:17 Guardian miibus1: <MII bus> on bce1
Feb 5 12:46:17 Guardian brgphy1: <BCM5709 10/100/1000baseT PHY> PHY 1 on miibus1
Feb 5 12:46:17 Guardian brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
Feb 5 12:46:17 Guardian bce1: Using defaults for TSO: 65518/35/2048
Feb 5 12:46:17 Guardian bce1: Ethernet address: 00:24:e8:5f:05:9b
Feb 5 12:46:17 Guardian bce1:
Feb 5 12:46:17 Guardian kernel: bce1: link state changed to DOWN
Feb 5 12:46:17 Guardian kernel: bce1: link state changed to DOWN
Feb 5 12:46:17 Guardian ASIC (0x57092003); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (4.6.4); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 1.0.6)
Feb 5 12:46:17 Guardian Coal (RX:6,6,18,18; TX:20,20,80,80)
Feb 5 12:46:17 Guardian pcib2: <ACPI PCI-PCI bridge> at device 3.0 on pci0
Feb 5 12:46:17 Guardian pci2: <ACPI PCI bus> on pcib2
Feb 5 12:46:17 Guardian bce2: <QLogic NetXtreme II BCM5709 1000Base-T (C0)> mem 0xda000000-0xdbffffff irq 32 at device 0.0 on pci2
Feb 5 12:46:17 Guardian miibus2: <MII bus> on bce2
Feb 5 12:46:17 Guardian brgphy2: <BCM5709 10/100/1000baseT PHY> PHY 1 on miibus2
Feb 5 12:46:17 Guardian brgphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
Feb 5 12:46:17 Guardian bce2: Using defaults for TSO: 65518/35/2048
Feb 5 12:46:17 Guardian bce2: Ethernet address: 00:24:e8:5f:05:9d
Feb 5 12:46:17 Guardian bce2:
Feb 5 12:46:17 Guardian kernel: bce2: link state changed to DOWN
Feb 5 12:46:17 Guardian kernel: bce2: link state changed to DOWN
Feb 5 12:46:17 Guardian ASIC (0x57092003); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (4.6.4); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 1.0.6)
Feb 5 12:46:17 Guardian Coal (RX:6,6,18,18; TX:20,20,80,80)
Feb 5 12:46:17 Guardian bce3: <QLogic NetXtreme II BCM5709 1000Base-T (C0)> mem 0xdc000000-0xddffffff irq 42 at device 0.1 on pci2
Feb 5 12:46:17 Guardian miibus3: <MII bus> on bce3
Feb 5 12:46:17 Guardian brgphy3: <BCM5709 10/100/1000baseT PHY> PHY 1 on miibus3
Feb 5 12:46:17 Guardian brgphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
Feb 5 12:46:17 Guardian bce3: Using defaults for TSO: 65518/35/2048
Feb 5 12:46:17 Guardian bce3: Ethernet address: 00:24:e8:5f:05:9f
Feb 5 12:46:17 Guardian bce3:
Feb 5 12:46:17 Guardian kernel: bce3: link state changed to DOWN
Feb 5 12:46:17 Guardian kernel: bce3: link state changed to DOWN
Feb 5 12:46:17 Guardian ASIC (0x57092003); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (4.6.4); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 1.0.6)
Feb 5 12:46:17 Guardian Coal (RX:6,6,18,18; TX:20,20,80,80)
Feb 5 12:46:17 Guardian pcib3: <ACPI PCI-PCI bridge> at device 4.0 on pci0
Feb 5 12:46:17 Guardian pci3: <ACPI PCI bus> on pcib3
Feb 5 12:46:17 Guardian pcib4: <ACPI PCI-PCI bridge> at device 5.0 on pci0
Feb 5 12:46:17 Guardian pci4: <ACPI PCI bus> on pcib4
Feb 5 12:46:17 Guardian mps0: <Avago Technologies (LSI) SAS2008> port 0xfc00-0xfcff mem 0xdf1bc000-0xdf1bffff,0xdf1c0000-0xdf1fffff irq 34 at device 0.0 on pci4
Feb 5 12:46:17 Guardian mps0: Firmware: 16.00.00.00, Driver: 21.02.00.00-fbsd
Feb 5 12:46:17 Guardian mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Feb 5 12:46:17 Guardian pcib5: <ACPI PCI-PCI bridge> at device 6.0 on pci0
Feb 5 12:46:17 Guardian pci5: <ACPI PCI bus> on pcib5
Feb 5 12:46:17 Guardian pcib6: <ACPI PCI-PCI bridge> at device 7.0 on pci0
Feb 5 12:46:17 Guardian pci6: <ACPI PCI bus> on pcib6
Feb 5 12:46:17 Guardian pcib7: <ACPI PCI-PCI bridge> at device 9.0 on pci0
Feb 5 12:46:17 Guardian pci7: <ACPI PCI bus> on pcib7
Feb 5 12:46:17 Guardian pci0: <base peripheral, interrupt controller> at device 20.0 (no driver attached)
Feb 5 12:46:17 Guardian pci0: <base peripheral, interrupt controller> at device 20.1 (no driver attached)
Feb 5 12:46:17 Guardian pci0: <base peripheral, interrupt controller> at device 20.2 (no driver attached)
Feb 5 12:46:17 Guardian uhci0: <Intel 82801I (ICH9) USB controller> port 0xec40-0xec5f irq 17 at device 26.0 on pci0
Feb 5 12:46:17 Guardian usbus0 on uhci0
Feb 5 12:46:17 Guardian usbus0: 12Mbps Full Speed USB v1.0
Feb 5 12:46:17 Guardian uhci1: <Intel 82801I (ICH9) USB controller> port 0xec60-0xec7f irq 18 at device 26.1 on pci0
Feb 5 12:46:17 Guardian usbus1 on uhci1
Feb 5 12:46:17 Guardian usbus1: 12Mbps Full Speed USB v1.0
Feb 5 12:46:17 Guardian ehci0: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xdf0fe000-0xdf0fe3ff irq 19 at device 26.7 on pci0
Feb 5 12:46:17 Guardian usbus2: EHCI version 1.0
Feb 5 12:46:17 Guardian usbus2 on ehci0
Feb 5 12:46:17 Guardian usbus2: 480Mbps High Speed USB v2.0
Feb 5 12:46:17 Guardian uhci2: <Intel 82801I (ICH9) USB controller> port 0xec80-0xec9f irq 21 at device 29.0 on pci0
Feb 5 12:46:17 Guardian usbus3 on uhci2
Feb 5 12:46:17 Guardian usbus3: 12Mbps Full Speed USB v1.0
Feb 5 12:46:17 Guardian uhci3: <Intel 82801I (ICH9) USB controller> port 0xeca0-0xecbf irq 20 at device 29.1 on pci0
Feb 5 12:46:17 Guardian usbus4 on uhci3
Feb 5 12:46:17 Guardian usbus4: 12Mbps Full Speed USB v1.0
Feb 5 12:46:17 Guardian ehci1: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xdf0ff000-0xdf0ff3ff irq 21 at device 29.7 on pci0
Feb 5 12:46:17 Guardian usbus5: EHCI version 1.0
Feb 5 12:46:17 Guardian usbus5 on ehci1
Feb 5 12:46:17 Guardian usbus5: 480Mbps High Speed USB v2.0
Feb 5 12:46:17 Guardian pcib8: <ACPI PCI-PCI bridge> at device 30.0 on pci0
Feb 5 12:46:17 Guardian pci8: <ACPI PCI bus> on pcib8
Feb 5 12:46:17 Guardian vgapci0: <VGA-compatible display> mem 0xd5800000-0xd5ffffff,0xde7fc000-0xde7fffff,0xde800000-0xdeffffff irq 19 at device 3.0 on pci8
Feb 5 12:46:17 Guardian vgapci0: Boot video device
Feb 5 12:46:17 Guardian isab0: <PCI-ISA bridge> at device 31.0 on pci0
Feb 5 12:46:17 Guardian isa0: <ISA bus> on isab0
Feb 5 12:46:17 Guardian atapci0: <Intel ICH9 SATA300 controller> port 0xec10-0xec17,0xec08-0xec0b,0xec18-0xec1f,0xec0c-0xec0f,0xec20-0xec2f,0xec30-0xec3f irq 23 at device 31.2 on pci0
Feb 5 12:46:17 Guardian ata2: <ATA channel> at channel 0 on atapci0
Feb 5 12:46:17 Guardian ata3: <ATA channel> at channel 1 on atapci0
Feb 5 12:46:17 Guardian uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
Feb 5 12:46:17 Guardian uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
Feb 5 12:46:17 Guardian ichwd0: <Intel ICH9 watchdog timer> on isa0
Feb 5 12:46:17 Guardian orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff,0xc9000-0xcbfff,0xd2000-0xd2fff,0xec000-0xeffff on isa0
Feb 5 12:46:17 Guardian sc0: <System console> at flags 0x100 on isa0
Feb 5 12:46:17 Guardian sc0: VGA <16 virtual consoles, flags=0x300>
Feb 5 12:46:17 Guardian vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Feb 5 12:46:17 Guardian atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
Feb 5 12:46:17 Guardian atkbd0: <AT Keyboard> irq 1 on atkbdc0
Feb 5 12:46:17 Guardian kbd0 at atkbd0
Feb 5 12:46:17 Guardian atkbd0: [GIANT-LOCKED]
Feb 5 12:46:17 Guardian coretemp0: <CPU On-Die Thermal Sensors> on cpu0
Feb 5 12:46:17 Guardian est0: <Enhanced SpeedStep Frequency Control> on cpu0
Feb 5 12:46:17 Guardian coretemp1: <CPU On-Die Thermal Sensors> on cpu1
Feb 5 12:46:17 Guardian est1: <Enhanced SpeedStep Frequency Control> on cpu1
Feb 5 12:46:17 Guardian coretemp2: <CPU On-Die Thermal Sensors> on cpu2
Feb 5 12:46:17 Guardian est2: <Enhanced SpeedStep Frequency Control> on cpu2
Feb 5 12:46:17 Guardian coretemp3: <CPU On-Die Thermal Sensors> on cpu3
Feb 5 12:46:17 Guardian est3: <Enhanced SpeedStep Frequency Control> on cpu3
Feb 5 12:46:17 Guardian coretemp4: <CPU On-Die Thermal Sensors> on cpu4
Feb 5 12:46:17 Guardian est4: <Enhanced SpeedStep Frequency Control> on cpu4
Feb 5 12:46:17 Guardian coretemp5: <CPU On-Die Thermal Sensors> on cpu5
Feb 5 12:46:17 Guardian est5: <Enhanced SpeedStep Frequency Control> on cpu5
Feb 5 12:46:17 Guardian coretemp6: <CPU On-Die Thermal Sensors> on cpu6
Feb 5 12:46:17 Guardian est6: <Enhanced SpeedStep Frequency Control> on cpu6
Feb 5 12:46:17 Guardian coretemp7: <CPU On-Die Thermal Sensors> on cpu7
Feb 5 12:46:17 Guardian est7: <Enhanced SpeedStep Frequency Control> on cpu7
Feb 5 12:46:17 Guardian ZFS filesystem version: 5
Feb 5 12:46:17 Guardian ZFS storage pool version: features support (5000)
Feb 5 12:46:17 Guardian Timecounters tick every 1.000 msec
Feb 5 12:46:17 Guardian freenas_sysctl: adding account.
Feb 5 12:46:17 Guardian freenas_sysctl: adding directoryservice.
Feb 5 12:46:17 Guardian freenas_sysctl: adding middlewared.
Feb 5 12:46:17 Guardian freenas_sysctl: adding network.
Feb 5 12:46:17 Guardian freenas_sysctl: adding services.
Feb 5 12:46:17 Guardian ipfw2 (+ipv6) initialized, divert enabled, nat enabled, default to accept, logging disabled
Feb 5 12:46:17 Guardian nvme cam probe device init
Feb 5 12:46:17 Guardian ugen2.1: <Intel EHCI root HUB> at usbus2
Feb 5 12:46:17 Guardian ugen1.1: <Intel UHCI root HUB> at usbus1
Feb 5 12:46:17 Guardian ugen5.1: <Intel EHCI root HUB> at usbus5
Feb 5 12:46:17 Guardian ugen3.1: <Intel UHCI root HUB> at usbus3
Feb 5 12:46:17 Guardian ugen4.1: <Intel UHCI root HUB> at usbus4
Feb 5 12:46:17 Guardian ugen0.1: <Intel UHCI root HUB> at usbus0
Feb 5 12:46:17 Guardian uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
Feb 5 12:46:17 Guardian uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3
Feb 5 12:46:17 Guardian uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
Feb 5 12:46:17 Guardian uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
Feb 5 12:46:17 Guardian uhub5: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5
Feb 5 12:46:17 Guardian uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
Feb 5 12:46:17 Guardian mps0: SAS Address for SATA device = 371a44456882895d
Feb 5 12:46:17 Guardian mps0: SAS Address for SATA device = 371a444568819b51
Feb 5 12:46:17 Guardian mps0: SAS Address for SATA device = 371a444567888f4d
Feb 5 12:46:17 Guardian mps0: SAS Address for SATA device = 371a44457e8d8446
Feb 5 12:46:17 Guardian mps0: SAS Address for SATA device = 371a444568827c44
Feb 5 12:46:17 Guardian mps0: SAS Address for SATA device = 371a444567888935
Feb 5 12:46:17 Guardian mps0: SAS Address from SATA device = 371a44456882895d
Feb 5 12:46:17 Guardian mps0: SAS Address from SATA device = 371a444568819b51
Feb 5 12:46:17 Guardian mps0: SAS Address from SATA device = 371a444567888f4d
Feb 5 12:46:17 Guardian mps0: SAS Address from SATA device = 371a44457e8d8446
Feb 5 12:46:17 Guardian mps0: SAS Address from SATA device = 371a444568827c44
Feb 5 12:46:17 Guardian mps0: SAS Address from SATA device = 371a444567888935
Feb 5 12:46:17 Guardian uhub3: 2 ports with 2 removable, self powered
Feb 5 12:46:17 Guardian uhub4: 2 ports with 2 removable, self powered
Feb 5 12:46:17 Guardian uhub2: 2 ports with 2 removable, self powered
Feb 5 12:46:17 Guardian uhub1: 2 ports with 2 removable, self powered
Feb 5 12:46:17 Guardian uhub0: 4 ports with 4 removable, self powered
Feb 5 12:46:17 Guardian uhub5: 4 ports with 4 removable, self powered
Feb 5 12:46:17 Guardian ugen2.2: <vendor 0x0424 product 0x2514> at usbus2
Feb 5 12:46:17 Guardian uhub6 on uhub0
Feb 5 12:46:17 Guardian uhub6: <vendor 0x0424 product 0x2514, class 9/0, rev 2.00/0.00, addr 2> on usbus2
Feb 5 12:46:17 Guardian uhub6: MTT enabled
Feb 5 12:46:17 Guardian uhub6: 3 ports with 3 removable, self powered
Feb 5 12:46:17 Guardian ugen3.2: <Avocent USB Composite Device-0> at usbus3
Feb 5 12:46:17 Guardian ukbd0 on uhub2
Feb 5 12:46:17 Guardian ukbd0: <Keyboard> on usbus3
Feb 5 12:46:17 Guardian kbd2 at ukbd0
Feb 5 12:46:17 Guardian ugen2.3: <SanDisk Cruzer Fit> at usbus2
Feb 5 12:46:17 Guardian umass0 on uhub6
Feb 5 12:46:17 Guardian umass0: <SanDisk Cruzer Fit, class 0/0, rev 2.00/1.27, addr 3> on usbus2
Feb 5 12:46:17 Guardian umass0: SCSI over Bulk-Only; quirks = 0x8100
Feb 5 12:46:17 Guardian umass0:4:0: Attached to scbus4
Feb 5 12:46:17 Guardian random: unblocking device.
Feb 5 12:46:17 Guardian da6 at umass-sim0 bus 0 scbus4 target 0 lun 0
Feb 5 12:46:17 Guardian da1 at mps0 bus 0 scbus0 target 1 lun 0
Feb 5 12:46:17 Guardian da6: <SanDisk Cruzer Fit 1.27> Removable Direct Access SPC-4 SCSI device
Feb 5 12:46:17 Guardian da6: Serial Number 4C531001640325119574
Feb 5 12:46:17 Guardian da6: 40.000MB/s transfersda1: <ATA ST8000AS0002-1NA AR13> Fixed Direct Access SPC-4 SCSI device
Feb 5 12:46:17 Guardian da1: Serial Number Z8404EXN
Feb 5 12:46:17 Guardian da1: 600.000MB/s transfers
Feb 5 12:46:17 Guardian da1: Command Queueing enabled
Feb 5 12:46:17 Guardian da1: 7630885MB (15628053168 512 byte sectors)
Feb 5 12:46:17 Guardian da1: quirks=0x80<SMR_DM>
Feb 5 12:46:17 Guardian da4 at mps0 bus 0 scbus0 target 5 lun 0
Feb 5 12:46:17 Guardian da6: 29812MB (61056064 512 byte sectors)
Feb 5 12:46:17 Guardian da6: quirks=0x2<NO_6_BYTE>
Feb 5 12:46:17 Guardian da0 at mps0 bus 0 scbus0 target 0 lun 0
Feb 5 12:46:17 Guardian da4: <ATA ST8000AS0002-1NA AR13> Fixed Direct Access SPC-4 SCSI device
Feb 5 12:46:17 Guardian da4: Serial Number Z8403LF2
Feb 5 12:46:17 Guardian da4: 600.000MB/s transfers
Feb 5 12:46:17 Guardian da4: Command Queueing enabled
Feb 5 12:46:17 Guardian da4: 7630885MB (15628053168 512 byte sectors)
Feb 5 12:46:17 Guardian da4: quirks=0x80<SMR_DM>
Feb 5 12:46:17 Guardian da5 at mps0 bus 0 scbus0 target 7 lun 0
Feb 5 12:46:17 Guardian da0: <ATA ST8000AS0002-1NA AR13> Fixed Direct Access SPC-4 SCSI device
Feb 5 12:46:17 Guardian da0: Serial Number Z8404FFZ
Feb 5 12:46:17 Guardian da0: 600.000MB/s transfers
Feb 5 12:46:17 Guardian da0: Command Queueing enabled
Feb 5 12:46:17 Guardian da0: 7630885MB (15628053168 512 byte sectors)
Feb 5 12:46:17 Guardian da5: <ATA ST8000AS0002-1NA RT17> Fixed Direct Access SPC-4 SCSI device
Feb 5 12:46:17 Guardian da5: Serial Number Z840JQAC
Feb 5 12:46:17 Guardian da5: 600.000MB/s transfers
Feb 5 12:46:17 Guardian da5: Command Queueing enabled
Feb 5 12:46:17 Guardian da5: 7630885MB (15628053168 512 byte sectors)
Feb 5 12:46:17 Guardian da5: quirks=0x80<SMR_DM>
Feb 5 12:46:17 Guardian da0: quirks=0x80<SMR_DM>
Feb 5 12:46:17 Guardian da2 at mps0 bus 0 scbus0 target 2 lun 0
Feb 5 12:46:17 Guardian da2: <ATA ST8000AS0002-1NA AR13> Fixed Direct Access SPC-4 SCSI device
Feb 5 12:46:17 Guardian da2: Serial Number Z8403LLJ
Feb 5 12:46:17 Guardian da2: 600.000MB/s transfers
Feb 5 12:46:17 Guardian da2: Command Queueing enabled
Feb 5 12:46:17 Guardian da2: 7630885MB (15628053168 512 byte sectors)
Feb 5 12:46:17 Guardian da2: quirks=0x80<SMR_DM>
Feb 5 12:46:17 Guardian da3 at mps0 bus 0 scbus0 target 4 lun 0
Feb 5 12:46:17 Guardian da3: <ATA ST8000AS0002-1NA AR13> Fixed Direct Access SPC-4 SCSI device
Feb 5 12:46:17 Guardian da3: Serial Number Z8404F9A
Feb 5 12:46:17 Guardian da3: 600.000MB/s transfers
Feb 5 12:46:17 Guardian da3: Command Queueing enabled
Feb 5 12:46:17 Guardian da3: 7630885MB (15628053168 512 byte sectors)
Feb 5 12:46:17 Guardian da3: quirks=0x80<SMR_DM>
Feb 5 12:46:17 Guardian cd0 at ata2 bus 0 scbus1 target 0 lun 0
Feb 5 12:46:17 Guardian cd0: <TEAC DVD-ROM DV28SV D.0J> Removable CD-ROM SCSI device
Feb 5 12:46:17 Guardian cd0: Serial Number 09051300212328
Feb 5 12:46:17 Guardian cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
Feb 5 12:46:17 Guardian cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed
Feb 5 12:46:17 Guardian Trying to mount root from zfs:freenas-boot/ROOT/11.1-U1 []...
Feb 5 12:46:17 Guardian ipmi0: <IPMI System Interface> on isa0
Feb 5 12:46:17 Guardian ipmi0: KCS mode found at io 0xca8 alignment 0x4 on isa
Feb 5 12:46:17 Guardian ipmi0: IPMI device rev. 0, firmware rev. 1.03, version 2.0
Feb 5 12:46:17 Guardian ipmi0: Number of channels 5
Feb 5 12:46:17 Guardian ipmi0: Attached watchdog
Feb 5 12:46:17 Guardian vmx_init: VMX operation disabled by BIOS
Feb 5 12:46:17 Guardian module_register_init: MOD_LOAD (vmm, 0xffffffff82824400, 0) error 6
Feb 5 12:46:17 Guardian GEOM_RAID5: Module loaded, version 1.3.20140711.62 (rev f91e28e40bf7)
Feb 5 12:46:17 Guardian GEOM_MIRROR: Device mirror/swap0 launched (2/2).
Feb 5 12:46:17 Guardian GEOM_MIRROR: Device mirror/swap1 launched (2/2).
Feb 5 12:46:17 Guardian GEOM_MIRROR: Device mirror/swap2 launched (2/2).
Feb 5 12:46:17 Guardian GEOM_ELI: Device mirror/swap0.eli created.
Feb 5 12:46:17 Guardian GEOM_ELI: Encryption: AES-XTS 128
Feb 5 12:46:17 Guardian GEOM_ELI: Crypto: software
Feb 5 12:46:17 Guardian GEOM_ELI: Device mirror/swap1.eli created.
Feb 5 12:46:17 Guardian GEOM_ELI: Encryption: AES-XTS 128
Feb 5 12:46:17 Guardian GEOM_ELI: Crypto: software
Feb 5 12:46:17 Guardian GEOM_ELI: Device mirror/swap2.eli created.
Feb 5 12:46:17 Guardian GEOM_ELI: Encryption: AES-XTS 128
Feb 5 12:46:17 Guardian GEOM_ELI: Crypto: software
Feb 5 12:46:17 Guardian hwpmc: SOFT/16/64/0x67<INT,USR,SYS,REA,WRI> TSC/1/64/0x20<REA> IAP/4/48/0x3ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC> IAF/3/48/0x67<INT,USR,SYS,REA,WRI> UCP/8/48/0x3f8<EDG,THR,REA,WRI,INV,QUA,PRC> UCF/1/48/0x60<REA,WRI>
Feb 5 12:46:17 Guardian kernel: bce0: link state changed to UP
Feb 5 12:46:17 Guardian kernel: bce0: link state changed to UP
Feb 5 12:46:17 Guardian bce0: Gigabit link up!
Feb 5 12:46:17 Guardian bce0: Gigabit link up!
Feb 5 12:46:17 Guardian bce1:
Feb 5 12:46:17 Guardian kernel: bce1: link state changed to UP
Feb 5 12:46:17 Guardian kernel: bce1: link state changed to UP
Feb 5 12:46:17 Guardian Gigabit link up!
Feb 5 12:46:17 Guardian bce1: Gigabit link up!
Feb 5 12:46:17 Guardian bce2:
Feb 5 12:46:17 Guardian kernel: bce2: link state changed to UP
Feb 5 12:46:17 Guardian kernel: bce2: link state changed to UP
Feb 5 12:46:17 Guardian Gigabit link up!
Feb 5 12:46:17 Guardian bce2: Gigabit link up!
Feb 5 12:46:17 Guardian bce3:
Feb 5 12:46:17 Guardian kernel: bce3: link state changed to UP
Feb 5 12:46:17 Guardian kernel: bce3: link state changed to UP
Feb 5 12:46:17 Guardian Gigabit link up!
Feb 5 12:46:17 Guardian bce3: Gigabit link up!
Feb 5 12:46:17 Guardian ums0 on uhub2
Feb 5 12:46:17 Guardian ums0: <Mouse> on usbus3
Feb 5 12:46:17 Guardian ums0: 3 buttons and [Z] coordinates ID=0
Feb 5 12:46:20 Guardian ntpd2426: ntpd 4.2.8p10-a (1): Starting
Feb 5 12:46:26 Guardian root: /etc/rc: WARNING: failed precmd routine for vmware_guestd
Feb 5 12:46:31 Guardian root: /etc/rc: WARNING: failed precmd routine for minio
Feb 5 12:46:36 Guardian smartd3128: Unable to register device /dev/da6 (no Directive -d removable). Exiting.
Feb 5 12:46:42 Guardian daemon3436: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Feb 5 12:46:42 Guardian daemon3436: bootstrap = true: do not enable unless necessary
Feb 5 12:46:42 Guardian daemon3436: > Starting Consul agent...
Feb 5 12:46:42 Guardian daemon[3436]: > Consul agent running!
Feb 5 12:46:42 Guardian daemon3436: Version: 'v1.0.0'
Feb 5 12:46:42 Guardian daemon3436: Node ID: '4cb16b85-3afa-e40b-9e66-08a409f67822'
Feb 5 12:46:42 Guardian daemon3436: Node name: 'Guardian.local'
Feb 5 12:46:42 Guardian daemon3436: Datacenter: 'dc1' (Segment: '<all>')
Feb 5 12:46:42 Guardian daemon3436: Server: true (Bootstrap: true)
Feb 5 12:46:42 Guardian daemon3436: Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600)
Feb 5 12:46:42 Guardian daemon3436: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
Feb 5 12:46:42 Guardian daemon3436: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
Feb 5 12:46:42 Guardian daemon3436:
Feb 5 12:46:42 Guardian daemon3436: ==> Log data will now stream in as it occurs:
Feb 5 12:46:42 Guardian daemon3436:
Feb 5 12:46:49 Guardian daemon3436: 2018/02/05 12:46:49 [ERR] agent: failed to sync remote state: No cluster leader
Feb 5 12:46:52 Guardian daemon3436: 2018/02/05 12:46:52 [WARN] raft: Heartbeat timeout from "" reached, starting election
Feb 5 20:48:21 Guardian /middlewared215: dnssd_clientstub DNSServiceProcessResult called with invalid DNSServiceRef 0x81f61a540 FFFFFFFF DDDDDDDD
[WARN] agent: Check 'freenas_health' is now warning
(Snip)
Feb 5 13:10:04 Guardian sshd5897: _secure_path: /nonexistent/.login_conf is not owned by uid 1003
Feb 5 13:10:04 Guardian sshd5897: _secure_path: /nonexistent/.login_conf is not owned by uid 1003
Feb 5 13:10:04 Guardian sshd5898: _secure_path: /nonexistent/.login_conf is not owned by uid 1003
(Snip)

#6 Updated by Graham Bird over 2 years ago

Here are the system reboot messages, not that I see anything significant:

System booted at Sun Feb 4 00:03:30 2018
System booted at Sun Feb 4 06:47:50 2018
System booted at Mon Feb 5 07:07:11 2018
System booted at Mon Feb 5 18:26:52 2018
System booted at Mon Feb 5 18:26:52 2018
System booted at Tue Feb 6 06:12:26 2018
System booted at Tue Feb 6 12:39:51 2018
System booted at Wed Feb 7 01:11:08 2018
System booted at Wed Feb 7 20:20:57 2018
System booted at Thu Feb 8 02:49:19 2018

#7 Updated by Alexander Motin over 2 years ago

Console logs do not really help. Since syslogd is no longer functioning during the panic, it can not log all the interesting things.

#8 Updated by Graham Bird over 2 years ago

So where do we go from here? Is there more I can reaserh / do?

#9 Updated by Alexander Motin over 2 years ago

There can be two ways: either try to catch what's going on a physical (or may be serial, if you set it up) console when the server crashes, or try to figure out why crash does not leave the dumps. The second would be more reasonable but require to find developer time.

Could you also check your motherboard's event log for any events correlating with reboot? Just to be sure that those are indeed a software crashes.

#10 Updated by Graham Bird over 2 years ago

  • File log.csv added

Attached is the system log form the iDRAC. I note that the error 'System Board OS Watchdog: Watchdog sensor for System Board, hard reset by SMS/OS timer was asserted' is consistent with the reboots we have observed.

What little I can find on this seems to indicate that it is triggered when the system board thinks that the OS has locked up - though this answer was referring to a Windows Update.

#11 Updated by Alexander Motin over 2 years ago

Watchdog assertions explain why there is no any other debug info. You should take a look on system closer to see what is going on with it. Watchdog should fine only if system is not able to run user-space watchdogd for more then two minutes, that is not a normal situation for NAS.

#12 Updated by Graham Bird over 2 years ago

Sort of understood, though we are heading past the edge of my technical expertise :-(

The ONLY change that was made was the upgrade to 11-1-U1. Would you have any suggestions where to start or how to go about investigating further?

Cheers

#13 Updated by Alexander Motin over 2 years ago

Take a look on CPU usage in reporting section, CPU usage ans active processes in `top -SHIz`, workload of the server in time it happens, general responsibility of the server when it happens, etc.

#14 Updated by Graham Bird over 2 years ago

We have now rolled back to 9.10.2.-U6 and the system is running normally as before.

So it seems to be related to the upgraded version in some way.

Cheers

#15 Updated by Alexander Motin over 2 years ago

  • Priority changed from Expected to Important
  • Severity set to Low Medium
  • Reason for Blocked changed from Need additional information to Waiting for feedback

It could be related in many different ways, which is difficult even guess without knowing more. As a workaround you may try to disable watchdog timers with adding loader tunables:

hint.ipmi.0.disabled=1
hint.ichwd.0.disabled=1

and then what happen with your system: it may either really hang instead of those reboots or experience some transient delays, long enough for watchdog to activate. May be that give us any input.

#16 Updated by Dru Lavigne over 2 years ago

  • File deleted (debug-Guardian-20180207101527.tar)

#17 Updated by Dru Lavigne over 2 years ago

  • File deleted (log.csv)

#18 Updated by Dru Lavigne over 2 years ago

  • Status changed from Blocked to Closed
  • Target version changed from 11.2-RC2 to N/A
  • Private changed from Yes to No

Also available in: Atom PDF