Project

General

Profile

Bug #27514

Break potential recursion involving getnewvnode and zfs_rmnode

Added by Chris Moore over 1 year ago. Updated 9 months ago.

Status:
Done
Priority:
Expected
Assignee:
Benno Rice
Category:
OS
Target version:
Seen in:
Severity:
Medium
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

I was using the system at the time of reboot, which was at 0309 hours local time. Once it booted back up, I received an email saying, "System booted at Sat Dec 30 09:13:22 2017 was not shut down properly". The reboot was unexpected and I don't know what may have caused it.
I was watching a movie using the Plex plugin when this happened, but I don't think that was the cause since I have been using Plex extensively for years. The system had been running perfectly for about 15 days since I upgraded from version 11 to 11.1
I looked at /var/log/messages and the log has nothing from midnight (local) until 0313 when the entries start for the reboot.
Is there any other log that might show some answer about why this reboot occurred? I keep close watch on the system and I had not noticed any unusual activity.


Related issues

Related to FreeNAS - Bug #27775: Unauthorized system rebootClosed2018-01-11
Related to FreeNAS - Bug #30780: Unauthorized system rebootClosed
Related to FreeNAS - Bug #38923: #27514 fix causes panic on dataset quota overflowClosed
Has duplicate FreeNAS - Bug #27671: Random reboot in the middle of the night, every couple daysClosed: Duplicate2018-01-07
Has duplicate FreeNAS - Bug #28574: Unauthorized RebootsClosed2018-02-17
Has duplicate FreeNAS - Bug #28411: Unauthorized system rebootClosed2018-02-09
Has duplicate FreeNAS - Bug #28028: Unauthorized system rebootClosed2018-01-28

History

#1 Updated by Chris Moore over 1 year ago

  • File debug-Emily-NAS-20171230111636.txz added
  • Private changed from No to Yes

#2 Updated by Chris Moore over 1 year ago

  • File Emily_NAS_log_30_Dec_2017.txt added

#3 Updated by Dru Lavigne over 1 year ago

  • Status changed from Unscreened to 15
  • Assignee changed from Release Council to Alexander Motin

Chris: was this a one-off thing? It is possible that the additional load of running the nightly cron scripts while watching a movie triggered the reboot.

Sasha: do you see anything in the logs that might indicate a bug?

#4 Updated by Chris Moore over 1 year ago

This is the first time it has happened. I recently upgraded to 11.1 and am concerned that there may have been some previously undocumented fault.
I have been using the same hardware configuration for about 19 months and there have previously been times when I was re silvering a drive and watching a movie at the same time or an rsync or a scrub, it is usually not a problem. There have been plenty times when the processor load was fluctuating between 80 and 90% for over an hour, but it has never rebooted unexpectedly.
I am not sure how much of the hardware configuration is included in the files I uploaded, but it is a Supermicro board that has always been solid. I looked at the IPMI Server Health console to check all the Sensor Readings and it was all good. The last time there was anything negative in the log was back on October 30, 2016 when the CPU went to “Overheat” and I had to replace the CPU cooler for that, fan failure.
I guess that I just wanted to err on the side of caution due to this being a new release of the OS.

#5 Updated by Dru Lavigne over 1 year ago

  • Status changed from 15 to Unscreened
  • Seen in changed from Unspecified to 11.1

Sasha: do you see any likely culprits here?

#6 Updated by Alexander Motin over 1 year ago

  • Assignee changed from Alexander Motin to Benno Rice

It seems to be a kernel stack overflow, caused by request bouncing between ZFS and NULLFS.

Chris, haven't you modified anything about file systems mapping inside the jail?

Benno, could you take a look on this? May be connect somebody else from FreeBSD committers working more with VFS.

#7 Updated by Alexander Motin over 1 year ago

  • Has duplicate Bug #27671: Random reboot in the middle of the night, every couple days added

#8 Updated by Dru Lavigne over 1 year ago

  • Status changed from Unscreened to 15
  • Target version set to 11.2-BETA1

#9 Updated by Chris Moore over 1 year ago

Alexander Motin wrote:

It seems to be a kernel stack overflow, caused by request bouncing between ZFS and NULLFS.

Chris, haven't you modified anything about file systems mapping inside the jail?

Benno, could you take a look on this? May be connect somebody else from FreeBSD committers working more with VFS.

The only jail I am running is a standard Plex plugin. I did not change it except to add the links to where my data is on the pool.

#11 Updated by Dru Lavigne over 1 year ago

  • Status changed from 15 to Investigation

#12 Updated by Timur Bakeyev about 1 year ago

  • Related to Bug #27775: Unauthorized system reboot added

#13 Updated by Dru Lavigne about 1 year ago

  • Status changed from Investigation to In Progress

#14 Updated by Alexander Motin about 1 year ago

  • Related to Bug #28028: Unauthorized system reboot added

#15 Avatar?id=13649&size=24x24 Updated by Ben Gadd about 1 year ago

  • Due date set to 03/09/2018

#17 Updated by Dru Lavigne about 1 year ago

  • Target version changed from 11.2-BETA1 to 11.2-RC2

#18 Updated by Benno Rice about 1 year ago

  • Has duplicate Bug #28574: Unauthorized Reboots added

#19 Updated by Benno Rice about 1 year ago

  • Has duplicate Bug #28411: Unauthorized system reboot added

#21 Updated by Benno Rice about 1 year ago

  • Related to deleted (Bug #28028: Unauthorized system reboot)

#22 Updated by Benno Rice about 1 year ago

  • Has duplicate Bug #28028: Unauthorized system reboot added

#23 Updated by Benno Rice about 1 year ago

  • Subject changed from Unauthorized system reboot to zfs_rmnode can recurse via zfs_zget and getnewvnode

Also a correction to comment #10, we're not looping on the same vnode what we have is a long enough list of similar vnodes that the recursion blows the stack out.

#24 Updated by Alexander Motin about 1 year ago

  • Related to Bug #30780: Unauthorized system reboot added

#25 Updated by Nick Wolff about 1 year ago

  • Severity set to Low Medium

Marking this as low medium as this appears to be hard to trigger but cause a reboot of system with system coming back up clean otherwise.

#26 Updated by Alexander Motin about 1 year ago

  • Priority changed from No priority to Expected
  • Severity changed from Low Medium to Medium

It may be hard to trigger manually, but we have more then half dozen tickets reporting that. Not nice.

#28 Updated by Benno Rice 11 months ago

Patch merged to FreeNAS head: https://github.com/freenas/os/pull/119

#29 Updated by Dru Lavigne 11 months ago

  • File deleted (debug-Emily-NAS-20171230111636.txz)

#30 Updated by Dru Lavigne 11 months ago

  • File deleted (Emily_NAS_log_30_Dec_2017.txt)

#31 Updated by Dru Lavigne 11 months ago

  • Subject changed from zfs_rmnode can recurse via zfs_zget and getnewvnode to Break potential recursion involving getnewvnode and zfs_rmnode
  • Status changed from In Progress to Ready for Testing
  • Target version changed from 11.2-RC2 to 11.2-BETA1
  • Needs Doc changed from Yes to No
  • Needs Merging changed from Yes to No

#32 Updated by Dru Lavigne 11 months ago

  • Private changed from Yes to No

#33 Updated by Eric Blau 10 months ago

Is there any way this fix could be included in the next 11.1 update? I just upgraded (finally!) from 9.10-U6 to 11.1-U5 today and have hit this issue twice already when streaming video from my Plex jail. I've downgraded to 9.10-U6 in the meantime and it is rock solid. Having my NAS randomly reboot while streaming is not fun. Thanks.

#34 Updated by Dru Lavigne 10 months ago

Eric: we're investigating how intrusive it would be to backport this to the 11.1 branch and should have an answer next week.

#35 Updated by Dru Lavigne 10 months ago

Eric: we've decided not to backport to 11.1.x for now as this mostly affects jails and we want users to transition to iocage in the 11.2 branch (which already has this fix).

#36 Updated by Nick Wolff 10 months ago

  • Status changed from Ready for Testing to Passed Testing

No repro know so marking passed testing

#37 Updated by Dru Lavigne 10 months ago

  • Status changed from Passed Testing to Done
  • Needs QA changed from Yes to No

#38 Updated by Alexander Motin 9 months ago

  • Related to Bug #38923: #27514 fix causes panic on dataset quota overflow added

#39 Updated by Alexander Motin 9 months ago

It seems the fix is not perfect. I've created #38923 to follow that.

Also available in: Atom PDF