Project

General

Profile

Bug #58176

Fix race condition in swap pager

Added by Martin Renters 12 months ago. Updated 9 months ago.

Status:
Done
Priority:
No priority
Assignee:
Ryan Moeller
Category:
OS
Target version:
Seen in:
Severity:
Low Medium
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

FreeNAS 11.2 system is set up to server SMB share and iSCSI volume to a Windows 10 Pro box. I copied 1.4TB of data (mostly RAW files) from the SMB share to the iSCSI volume using the Windows File Explorer and the FreeNAS system rebooted several times in the process. I enabled the Debug Kernel and captured the panic on the iKVM console before the system rebooted.

System is a Supermicro SSG-5029P-E1CTR12L with Xeon 4110, 64GB of Kingston KSM26RD816HAI memory and 8 WD Red 4TB drives. Autotune was also enabled on the system.

iKVM_capture.jpg (370 KB) iKVM_capture.jpg Martin Renters, 11/16/2018 09:10 AM
iKVM_capture-4.jpg (369 KB) iKVM_capture-4.jpg Martin Renters, 11/18/2018 09:41 AM
40866
41283

History

#1 Updated by Martin Renters 12 months ago

Seems to be related to some interaction with what the GUI is doing. When I log into GUI, the dashboard comes up and then the system paniced again.

#2 Updated by Dru Lavigne 12 months ago

  • Private changed from No to Yes
  • Reason for Blocked set to Need additional information from Author

Martin: please attach a debug (System -> Advanced -> Save debug) to this ticket.

#3 Updated by Martin Renters 12 months ago

41283

I upgraded the machine to 11.2-RC2 and repeated the testing and got the same panic again. Attached is a new screenshot as well as the debug information. In my testing it appears to be reproducible when most of the memory is wired and I got into the new UI. Here is the output from top just before the crash.

last pid:  4905;  load averages:  2.21,  1.67,  1.01                                                                                     up 0+00:12:10  12:37:25
44 processes:  1 running, 43 sleeping
CPU:  0.0% user,  0.0% nice,  7.5% system,  0.4% interrupt, 92.0% idle
Mem: 56M Active, 90M Inact, 5480K Laundry, 61G Wired, 1031M Free
ARC: 57G Total, 4332M MFU, 52G MRU, 331K Anon, 703M Header, 57M Other
     54G Compressed, 55G Uncompressed, 1.02:1 Ratio
Swap: 8192M Total, 594M Used, 7597M Free, 7% Inuse, 5632K In

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 3948 root          1  27    0   168M 33884K zio->i  3   2:00   9.69% smbd
  239 root         32  52    0   303M 74020K uwait   0   0:29   7.83% python3.6
 3719 root         18  20    0   163M 28248K umtxn   0   0:02   0.86% uwsgi-3.6
 4650 root          1  20    0  7944K  2696K CPU14  14   0:00   0.12% top
 2187 root          1 -52   r0  3528K  3592K nanslp 11   0:00   0.07% watchdogd
 2110 nobody        1  20    0  6932K  2528K select  4   0:00   0.01% mdnsd
 3050 root          1  20    0   119M  4848K kqread  4   0:02   0.01% uwsgi-3.6
 3003 nobody        1  20    0  6932K  2540K select 15   0:00   0.01% mdnsd
 2678 root          1  20    0 12484K 12592K select 10   0:01   0.01% ntpd
 3907 root          1  20    0 12920K  4660K select  7   0:00   0.00% sshd
 3190 root         12  20    0   143M 11808K nanslp 10   0:05   0.00% collectd
 3830 root          1  52    0 67784K  4568K ttyin  12   0:01   0.00% python3.6
  347 root          2  21    0 67296K  3820K usem    2   0:01   0.00% python3.6
 2881 root          1  52    0  8452K     0K nanslp 14   0:01   0.00% <smartd>
  348 root          2  20    0 57324K  3820K piperd 15   0:01   0.00% python3.6

#4 Updated by Dru Lavigne 12 months ago

  • Assignee changed from Release Council to Alexander Motin
  • Seen in changed from 11.2-RC1 to 11.2-RC2
  • Reason for Blocked deleted (Need additional information from Author)

#5 Updated by Alexander Motin 12 months ago

  • Subject changed from System panics in sleepq_add to System panics in sysctl when copyout() with sleeping prohibited
  • Status changed from Unscreened to Screened
  • Severity changed from New to Low Medium

It seems like system is crashing when middleware tries to access some sysctl, but memory it tries to read information to from kernel is not mapped, that causes swap attempt in place where kernel does not allow it. I suppose my yesterday changes to reduce swap usage should reduce chances of this in new build and 11.2 release, though proper fix needs deeper code look.

#6 Updated by Alexander Motin 12 months ago

  • Assignee changed from Alexander Motin to Ryan Moeller

Actually, Ryan, please take a look on this.

#7 Updated by Ryan Moeller 12 months ago

  • Status changed from Screened to Not Started

#8 Updated by Ryan Moeller 10 months ago

  • Status changed from Not Started to In Progress

There is a lock order reversal between tmpfs and zfs logged shortly before the panic. I'm not sure if that is related:

Nov 18 12:46:17 vault lock order reversal:
Nov 18 12:46:17 vault 1st 0xfffff80189526068 tmpfs (tmpfs) @ /freenas-11.2-releng/freenas/_BE/os/sys/kern/vfs_mount.c:901
Nov 18 12:46:17 vault 2nd 0xfffff801f5129068 zfs (zfs) @ /freenas-11.2-releng/freenas/_BE/os/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1851
Nov 18 12:46:17 vault stack backtrace:
Nov 18 12:46:17 vault #0 0xffffffff80b48de0 at witness_debugger+0x70
Nov 18 12:46:17 vault #1 0xffffffff80b48c64 at witness_checkorder+0xe34
Nov 18 12:46:17 vault #2 0xffffffff80ab9e91 at lockmgr_lock_fast_path+0x1b1
Nov 18 12:46:17 vault #3 0xffffffff811bbdf1 at VOP_LOCK1_APV+0xe1
Nov 18 12:46:17 vault #4 0xffffffff80bc2f86 at _vn_lock+0x66
Nov 18 12:46:17 vault #5 0xffffffff804a120f at zfs_root+0xcf
Nov 18 12:46:17 vault #6 0xffffffff80ba9c7b at vfs_domount+0x9db
Nov 18 12:46:17 vault #7 0xffffffff80ba8ce4 at vfs_donmount+0x7f4
Nov 18 12:46:17 vault #8 0xffffffff80ba84c2 at sys_nmount+0x72
Nov 18 12:46:17 vault #9 0xffffffff80feedbb at amd64_syscall+0x78b
Nov 18 12:46:17 vault #10 0xffffffff80fca42d at fast_syscall_common+0x101

I found an MFC commit in 11-stable affecting the locking in swap_pager_getpages() (where the panic in this issue occurred) to handle a race condition between fork/vm_object_split() and faults:
https://github.com/freenas/os/commit/0556a47cc533046623b230de57af8e395f703425#diff-63e1e077363a720ca84fc529204202eb
There are multiple in-progress forks and page faults captured in the textdump, so this might be related to the race condition.

Martin, are you able to test a snapshot build such as https://download.freenas.org/11.3/MASTER/201901020921/x64/FreeNAS-11.3-MASTER-201901020921-443a93d.iso (which includes the above commit) to see if this is still an issue?

#9 Updated by Martin Renters 10 months ago

I've upgraded that server to 11.2-RELEASE and haven't seen the issue anymore so it is possible that Alex's changes either fixed it or masked it by making it more unlikely to happen. My machine is still being used for testing purposes, so I can install test software on it if you'd still like me to do so.

#10 Updated by Ryan Moeller 10 months ago

Thanks for the update! I will have to see if there is a way to roll back those changes for testing purposes and get back to you.

#11 Updated by Martin Renters 9 months ago

I have my own FreeNAS build environment set up now and I could compile a custom build that has Alexander's commit reverted to see if the lock traversal fixes the issue by itself. I believe Alexander's commit is this one: https://github.com/freenas/os/pull/152/commits/69091a24ed4b3255bc2326669299b019a54ad40b

Has your lock traversal fix been committed to the master branch already?

#12 Updated by Ryan Moeller 9 months ago

Martin Renters wrote:

I have my own FreeNAS build environment set up now and I could compile a custom build
Has your lock traversal fix been committed to the master branch already?

Ok, that would be extremely helpful! The freenas build master branch should use the freenas/11-stable branch of the os repo, which has the fix for the race condition.

#13 Updated by Martin Renters 9 months ago

I built my own release from the master branch and tested a copy from an SMB share to an iSCSI share on the same server. 1.5TB copied without any issues. I then backed out Alexander's one line ARC size patch and started the same copy again. The web interface seemed more sluggish as the ARC size approached 57GB (64GB of RAM) and then the copy and web interface seemed to hang although two ssh top sessions kept running. The good news is that the system didn't panic anymore. Given what I've seen, I don't think it is a good idea to back out Alexander's patch as the lock traversal fix doesn't seem to be sufficient to resolve the problem. I can try it again tomorrow with a debug kernel and see if it produces any more messages.

#14 Updated by Ryan Moeller 9 months ago

Thanks for testing that, Martin. I don't intend to revert the other fix, that was only to make it easier to test the fix for the locking. I'll pull that into 11.2 so we can wrap up this ticket.

#15 Updated by Martin Renters 9 months ago

Ryan - 11.2-RELEASE has Alexander's patch and I haven't had any panics with that. I only saw the panics in 11.2-RC1 and 11.2-RC2. Having said that, there must still be a deadlock somewhere when the system gets low on memory (e.g. without Alexander's patch), even with the latest lock reversal fix as evidenced by the partial lockup I saw yesterday when I compiled without that patch.

#17 Updated by Alexander Motin 9 months ago

  • Status changed from In Progress to Passed Testing
  • Needs QA changed from Yes to No
  • Needs Merging changed from Yes to No

#19 Updated by Alexander Motin 9 months ago

11.2-stable PR: https://github.com/freenas/os/pull/178
11-stable got this patch via regular update.

#20 Updated by Dru Lavigne 9 months ago

  • File deleted (debug-vault-20181118124729.tgz)

#21 Updated by Dru Lavigne 9 months ago

  • Status changed from Passed Testing to Done
  • Target version changed from Backlog to 11.2-U2

#22 Updated by Dru Lavigne 9 months ago

  • Subject changed from System panics in sysctl when copyout() with sleeping prohibited to Fix race condition in swap pager
  • Private changed from Yes to No
  • Needs Doc changed from Yes to No

Also available in: Atom PDF