Project

General

Profile

Bug #73812

Random Shutdowns

Added by Benjamin Perry 6 months ago. Updated 6 months ago.

Status:
Closed
Priority:
No priority
Assignee:
Alexander Motin
Category:
OS
Target version:
Severity:
New
Reason for Closing:
Cannot Reproduce
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

I have noticed that at some point during the night the FREENAS software will stop responding and the logs will stop at that point as if the server had shutdown. Is there a sleep mode that it enters or other auto shutdown feature? Thank you for your time and have a great day.

20190206_035336_HDR.jpg (6.41 MB) 20190206_035336_HDR.jpg Benjamin Perry, 02/06/2019 12:55 AM
15494433795786436461902964518206.jpg (6.22 MB) 15494433795786436461902964518206.jpg Benjamin Perry, 02/06/2019 12:57 AM
52462
52464

History

#1 Updated by Benjamin Perry 6 months ago

  • File debug-freenas-20190205211051.txz added
  • Private changed from No to Yes

#2 Updated by Sean Fagan 6 months ago

I think we're going to need more information than is provided. The debug information doesn't really show anything, and the system did not panic.

Is there anything on the console at the point where it is seemingly dead? If you press one of the keys on the keyboard, does anything happen? Can you ping it from another host while it is in this state?

#3 Updated by Benjamin Perry 6 months ago

Next time it does this I will write down anything at the display and try to ping the system from another host. I have previously tried waking the system by connecting a keyboard but it does not seem to make any difference and It does not seem to recognize that I have plugged in a new device since it will not display any message on screen about a new USB device. Also when this happens I have not been able to access the the web interface or any of the services hosted by FREENAS such as SMB or any of the VMs that are hosted on it. I have tried scheduling tasks to run every hour to prevent it from going to sleep. Last night according to the logs it stopped recording at 0220 Eastern started again at 0227 then stopped completely again at 0229 until I started the server again after I got off of work.

#4 Updated by Sean Fagan 6 months ago

You could also just take a picture of the screen, if there's anything on it.

This sounds like the system is experiencing hangs of some sort. But I don't see any complaints about anything timing out, which I'd expect with a disk timeout.

Plugging in a USB device may require user processes to be active; I am not entirely sure.

I see lots of instances of network interfaces going up and down; this may be due to jails and/or VMs.

#5 Updated by Benjamin Perry 6 months ago

Could it be the fact that I am using an a old OCZ sata SSD as the OS drive that is causing the issues with the system?

#6 Updated by Sean Fagan 6 months ago

I don't see any messages indicating you're out of ram; you have 16GBytes. If the boot device is having problems, that could cause user space issues, or kernel hangs. But without any information in the debug logs showing that, it's hard to say.

#7 Updated by Sean Fagan 6 months ago

You could try manually doing "zpool scrub freenas-boot", and see how that goes.

#8 Updated by Benjamin Perry 6 months ago

Just did the command through the shell, will it come back with a message if there is an error?

#9 Updated by Dru Lavigne 6 months ago

  • Assignee changed from Release Council to Alexander Motin

#10 Updated by Benjamin Perry 6 months ago

52462
52464

This is what I woke up to this morning, I am going to try removing the USB NIC and see if that helps.

#11 Updated by Alexander Motin 6 months ago

  • Status changed from Unscreened to Blocked
  • Reason for Blocked set to Need additional information from Author

Quiet hangs are not typical for software bugs. Especially with debug kernel I see you are running now, which is intentionally made to crash at first oddity, it would be much more likely for OS to crash of at least print some errors on console rather then just hang. Unfortunately without any debug information we simply have nowhere to start.

I would not recommend you to use the USB NIC, especially since you have some USB-related errors reported on console. As I see, your system has plenty PCIe slots, and it should be cheap to buy decent gigabit PCIe NIC like Intel, if you need another one.

I have AMD Ryzen 5 as my FreeBSD desktop machine and can say that I experience random hangs with rate of about once in a month or two, which I also have no ability to diagnose. So I can only guess whether it can be the same problem of the family or not. You may try to report your problem upstream to FreeBSD, since FreeNAS 11.2 is very close to FreeBSD 11.2-RELEASE.

#12 Updated by Alexander Motin 6 months ago

Make sure your system (CPU, chipset and cards) does not overheat, since it would increase chances of random failures.

#13 Updated by Benjamin Perry 6 months ago

I removed the USB NIC this morning, I was only using it to experiment with setting up port aggregation since I already had it. I normally only plug USB devices in when I need physical access to the server. Temperature should not be a issue since I am running a water cooling loop, the highest CPU temp I have seen so far is about 35C the chipset is normally about 25C, and the drives are reporting are reporting the highest temp as about 30C even after copying large files. Also were you able to get ECC memory to work with your Ryzen 5, since I have seen lots of conflicting information in regards to that? How would I go about getting additional needed debug information to aid in troubleshooting?

#14 Updated by Alexander Motin 6 months ago

Benjamin Perry wrote:

Temperature should not be a issue since I am running a water cooling loop

Water cooling makes me worry about some minor components, which may depend on active airflow.

Also were you able to get ECC memory to work with your Ryzen 5, since I have seen lots of conflicting information in regards to that?

As I have told, my system is a desktop, so it does not have ECC RAM. We do not widely use or sell AMD servers at this point to know much more.

How would I go about getting additional needed debug information to aid in troubleshooting?

I don't know. For server-grade servers I would recommend to check system logs for any errors, but your system is also desktop, so there is nobody to blame.

#15 Updated by Alexander Motin 6 months ago

  • Status changed from Blocked to Closed
  • Target version changed from Backlog to N/A
  • Reason for Closing set to Cannot Reproduce
  • Reason for Blocked deleted (Need additional information from Author)
  • Needs Merging changed from Yes to No

I am closing this due to lack of information to work with. Let us know if you find anything new.

#16 Updated by Dru Lavigne 6 months ago

  • File deleted (debug-freenas-20190205211051.txz)

#17 Updated by Dru Lavigne 6 months ago

  • Private changed from Yes to No

Also available in: Atom PDF