Project

General

Profile

Bug #27585

Random reboots - sometimes multiple times a day

Added by am hus over 2 years ago. Updated over 2 years ago.

Status:
Closed: Cannot reproduce
Priority:
No priority
Assignee:
Release Council
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

Dell 2950 gen 3 (dual Xeon quad core 2.66GHz, 32GB ECC RAM)
Perc 6i raid card replaced with H310 (http://i.dell.com/sites/doccontent/...heets/Documents/dell-perc-h310-spec-sheet.pdf) flashed to IT mode
5 x Seagate IronWolf 6TB NAS hard drives - raidz2
Boot USB stick: SanDisk Extreme 32 GB USB Flash

lspci output:
00:00.0 Host bridge: Intel Corporation 5000X Chipset Memory Controller Hub (rev 12)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 2 (rev 12)
00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 3 (rev 12)
00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev 12)
00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 5 (rev 12)
00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 6-7 (rev 12)
00:07.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 7 (rev 12)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 12)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 12)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 12)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 12)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 12)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 12)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 12)
00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
00:1d.0 USB controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)
00:1d.1 USB controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)
00:1d.2 USB controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)
00:1d.3 USB controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #4 (rev 09)
00:1d.7 USB controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)
02:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c3)
03:00.0 Ethernet controller: Broadcom Limited NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
04:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
04:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)
05:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)
05:01.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E2 (rev 01)
06:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c3)
07:00.0 Ethernet controller: Broadcom Limited NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
0c:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
0e:0d.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] ES1000 (rev 02)

ChangeLog Required:
No

Description

My system is randomly rebooting and I don't have enough knowledge to figure out what is causing this.

I've had a look through /var/log/messages and done some simple things so far but need more help. I posted on the Freenas forum and was asked to file a bug report here.
I can't remember when the reboots started as it never actually affected the usage of the system. It might have been there since first install for all I know.

The output of "last reboot" shows:

boot time                                  Mon Jan  1 12:00
boot time                                  Sun Dec 31 19:41
shutdown time                              Sun Dec 31 19:31
boot time                                  Sat Dec 30 21:20
boot time                                  Sat Dec 30 18:33
boot time                                  Sat Dec 30 17:52
boot time                                  Thu Dec 28 20:18
boot time                                  Tue Dec 26 13:02
boot time                                  Tue Dec 26 02:52
boot time                                  Mon Dec 25 17:24
boot time                                  Mon Dec 25 13:09
boot time                                  Mon Dec 25 01:26
boot time                                  Sun Dec 24 15:52
boot time                                  Sun Dec 24 15:09
boot time                                  Sun Dec 24 05:57
boot time                                  Sat Dec 23 14:54
boot time                                  Sat Dec 23 07:25
boot time                                  Fri Dec 22 09:10
boot time                                  Tue Dec 19 14:00
boot time                                  Sat Dec 16 12:01
boot time                                  Fri Dec 15 10:38
boot time                                  Fri Dec 15 10:33
shutdown time                              Fri Dec 15 10:30
boot time                                  Fri Dec 15 02:13
shutdown time                              Fri Dec 15 02:10
boot time                                  Fri Dec 15 01:43
shutdown time                              Fri Dec 15 01:39

utx.log begins Mon Dec 11 23:41:30 GMT 2017

I can’t see anything obvious in /var/log/messages...I’ve attached messages from Dec 30...however, I’m very unfamiliar with FreeBSD and Linux admin in general, so I’m not sure what I should be looking for or where else I should be looking or try. So, looking for any hints and tips, even basic stuff is appreciated.

What I’ve done/checked so far:
  • I updated to FreeNAS 11.1 on Dec 31 (which is when the correct shutdown and reboot occurred above) but it has since rebooted unexpectedly. So problem still exists.
  • I checked other reboot issues reported by people and didn’t find one that I thought was similar to mine...I did find one where autotune was maybe a problem, so I’ve disabled that today.
  • I reseated all hardware in the server
  • I switched power to the secondary power supply
  • /data/crash is empty
  • dmesg has the following warning but I read somewhere that this is fine (?):
    “WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.
    bce0: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = 0x00002006)
    bce1: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = 0x00002006)”
    
  • Since the upgrade to 11.1-RELEASE I’m getting the known bug messages “Check 'service:nas-health' is now warning”
  • I also got the message that my boot USB failed SMART check:
    “Device: /dev/da5 [SAT], Failed SMART usage Attribute: 232 Perc_Avail_Resrvd_Space.”.  
    

I don’t think this is the cause of the reboots, as this message only came up after the 11.1-RELEASE upgrade...but I could be wrong.

System Info
Freenas 11.0-u2 (created 2017-07-24); then upgraded to
Freenas 11.0-u4 (created 2017-09-30); then upgraded to
Freenas 11.1-RELEASE (created 2017-12-31) to see of that solves the problem, it didn't.
Encryption is enabled (key is backed up :) )

uname -a output:

FreeBSD archive.local 11.1-STABLE FreeBSD 11.1-STABLE #0 r321665+d4625dcee3e(freenas/11.1-stable): Wed Dec 13 16:33:42 UTC 2017     root@gauntlet:/freenas-11-releng/freenas/_BE/objs/freenas-11-releng/freenas/_BE/os/sys/FreeNAS.amd64  amd64**

History

#1 Updated by am hus over 2 years ago

So far no reboots. 5 days is the longest uptime since Dec 15. I won't do anything to it for another few days then start re-enabling autotune settings a few at a time to see if any of those are the cause.

#2 Updated by Dru Lavigne over 2 years ago

  • Status changed from Unscreened to Closed: Cannot reproduce
  • Target version set to N/A

I'll mark this as closed for now. If you figure out which autotune setting triggers it, add a comment to this ticket. Alternately, if it starts happening again and you're not sure why, attach a debug (System -> Advanced -> Save Debug) to this ticket.

#3 Updated by Dru Lavigne over 2 years ago

  • File deleted (messages-dec_20.txt)

Also available in: Atom PDF