Project

General

Profile

Bug #15813

Avatar?id=14398&size=22x22

Jail Crashes Entire NAS Upon Shutdown

Added by Stephen Lee over 4 years ago. Updated about 3 years ago.

Status:
Closed: Not To Be Fixed
Priority:
No priority
Assignee:
Kris Moore
Category:
Middleware
Target version:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

TL;DR The issue is either the software running in the Jail or the Jail itself, but I'd appreciate some assistance in collecting the necessary diagnostic information to determine root cause.

Last Friday, my entire home network (75Mb U/D) was experiencing degraded performance as measured by occasional packet loss and slow pings to www.google.com/www.yahoo.com. I initially concluded that it was something with my OpenWRT router, but after isolating various components across my wired network, I determined that the cause of the degradation was a recently installed Jail.

The Jail's very basic; I only enabled DHCP. (The Jail doesn't automatically get an IP address, but I believe that's an existing, separate issue.) Its sole purpose is to run rclone-1.29_amd64 (https://github.com/ncw/rclone/releases/tag/v1.29) to maintain a local copy of my Amazon Cloud Drive photos. I did an initial sync and then setup a cron job to do the same sync every five minutes.

To further confirm my hypothesis that something specific to the Jail is the culprit, I first started the Jail. I manually called dhclient to grab and IP address, and then kicked off rclone. The software downloaded a largish (2-3GB) video and exited successfully. In parallel, I pinged www.google.com and noted an increase in latency from <10ms to ~100ms once rclone is running. Once rclone has exited, high ping times persist. I attempted to shutdown the Jail using the UI, but something crashes and the entire NAS reboots. While it's entirely possible that rclone is buggy, I believe it ultimately puts the network stack in some bad, unrecoverable state.

My Linux instincts lead me to poke around /var/log/*, but I failed to find any core dumps or relevant logging, so I'm left unsure how to diagnose further. As I am a *BSD novice and am not super keen in triggering a reboot every time I attempt to repro, I'd love some guidance on next steps.

History

#1 Avatar?id=14398&size=24x24 Updated by Kris Moore over 4 years ago

  • Status changed from Unscreened to Closed: Not To Be Fixed

The culprit here is VIMAGE support for jails. (You can google that, it has a long and sordid history of causing kernel panics at shutdown)

At the time this isn't on the radar to fix, since it's pretty complex and requires a fair amount of FreeBSD kernel hacking.

#2 Updated by Stephen Lee over 4 years ago

Kris Moore wrote:

The culprit here is VIMAGE support for jails. (You can google that, it has a long and sordid history of causing kernel panics at shutdown)

At the time this isn't on the radar to fix, since it's pretty complex and requires a fair amount of FreeBSD kernel hacking.

First, thanks for the super fast reply!

Second, some clarification questions: are you referring to FreeNAS 9.10's lack of VIMAGE support for jails or a buggy implementation of VIMAGE support for jails? Is whatever's the answer to my prior question the reason for the network performance degradation and/or just the kernel panic upon shutdown behavior?

#3 Updated by Stephen Lee over 4 years ago

Stephen Lee wrote:

Kris Moore wrote:

The culprit here is VIMAGE support for jails. (You can google that, it has a long and sordid history of causing kernel panics at shutdown)

At the time this isn't on the radar to fix, since it's pretty complex and requires a fair amount of FreeBSD kernel hacking.

First, thanks for the super fast reply!

Second, some clarification questions: are you referring to FreeNAS 9.10's lack of VIMAGE support for jails or a buggy implementation of VIMAGE support for jails? Is whatever's the answer to my prior question the reason for the network performance degradation and/or just the kernel panic upon shutdown behavior?

Sorry, one last question: if I want to continue using rclone, assuming it's not compatible with the FreeNAS 9.10 jails support, what do you recommend? Should I virtualize using bhyve? If using bhyve, should I avoid *BSD?

#4 Avatar?id=14398&size=24x24 Updated by Kris Moore over 4 years ago

No problem!

VIMAGE is a FreeBSD jails/networking feature which we have enabled in FreeNAS currently. It allows us to do multi-cast (among other things) inside a jail.

However, its implementation is still dodgy at best, and the source of many kernel panics (often at shutdown)

As for rclone, if you can run it in a jail, then sure, you can do that. Just know that shutting down / fiddling with the jail while in use is somewhat risky.

As for the ping issues, I'm unsure if that's related to the jail at all. What were the ping times like from the host directly? Was the entire system slower, or just the jail in which rclone was setup?

You could use bhyve for this purpose also and run BSD or Linux or whatnot inside it, but a jail would probably be the easiest way to go, and require less system resources.

#5 Updated by Josh Paetzel over 4 years ago

If you get a chance to run system -> advanced -> save debug when FreeNAS is happy versus when it is sad, I'd be happy to look at them both and see if the cause of the sadness could be divined.

Better networking through tuning is my motto. Or maybe it's better living through chemicals. I can never remember.

#6 Updated by Stephen Lee over 4 years ago

Josh Paetzel wrote:

If you get a chance to run system -> advanced -> save debug when FreeNAS is happy versus when it is sad, I'd be happy to look at them both and see if the cause of the sadness could be divined.

Better networking through tuning is my motto. Or maybe it's better living through chemicals. I can never remember.

@Josh, just to be clear - when you're referring to sad, do you mean run "system -> advanced -> save debug" when the networking performance has degraded, not when I've attempted to shutdown the jail?

If it's the former, then I'll try to gather the diag info. If it's the latter, I think the UI becomes unusable, so I don't think I can run that. Either way, I appreciate the extra attention!

#7 Updated by Josh Paetzel over 4 years ago

The former.

#8 Updated by Stephen Lee over 4 years ago

Josh Paetzel wrote:

The former.

@Josh, while I haven't performed the reproduction, I did produce the debug dump in "good" state, and after poking around in it, I see there are some details of the prior crashes. I feel a little weird about posting it in this public forum, so if at all possible, is there somewhere I can post it privately? If the dump doesn't provide enough detail on the "bad" state, then I'll proceed with reproducing the issue in the jail and then send another debug dump.

#9 Updated by Josh Paetzel over 4 years ago

Just set the ticket to private.

#10 Updated by Stephen Lee over 4 years ago

  • File debug-propjoe-20160607221605.tgz added
  • Private changed from No to Yes

Josh Paetzel wrote:

Just set the ticket to private.

Well, that was easy! Let me know if you need another debug dump.

#11 Updated by Stephen Lee over 4 years ago

Stephen Lee wrote:

Josh Paetzel wrote:

Just set the ticket to private.

Well, that was easy! Let me know if you need another debug dump.

Another data point: I ran rclone outside of a Jail on the same FreeNAS box, and it didn't seem to impact my local network.

#12 Updated by Stephen Lee over 4 years ago

Ignoring the "rclone" variable for a moment...

As a newbie to *BSD, the earlier comment about VIMAGE's bugginess upon shutdown piqued my curiosity. The following may be related to this issue, but if it's not, then I'm happy to create a new ticket if there's not one already. When I attempt to create a vlan interface in a Jail, I get a kernel panic when I attempt to shutdown the Jail.

I created the vlan inside the Jail by doing:
ifconfig vlan0 create
ifconfig vlan0 vlan 2 vlandev epairXb
ifconfig vlan0 inet 192.168.2.YYY netmask 255.255.255.0
ifconfig vlan0 up
// ping -S 192.168.2.YYY 192.168.2.1 to verify that my vlan0 interface works

I've tried this on three different instances:

  • FreeNAS 9.10 on real hardware
  • FreeNAS 9.10 in a VirtualBox VM (can't verify the ping part due)
  • FreeBSD 10.3-RELEASE on an old Intel Macpro: I recompiled the generic kernel according to http://wiki.polymorf.fr/index.php/Howto:FreeBSD_jail_vnet, installed "warden," and generally tried to mimic how FreeNAS does jails.

The FreeBSD 10.3-RELEASE instance is the only instance that DOES NOT crash.

#13 Updated by Dru Lavigne about 3 years ago

  • File deleted (debug-propjoe-20160607221605.tgz)

#14 Updated by Dru Lavigne about 3 years ago

  • Target version set to N/A
  • Private changed from Yes to No

Also available in: Atom PDF