Project

General

Profile

Bug #31065

Bring in upstream fix for NFSv4 session leak

Added by Mark Guzman over 1 year ago. Updated 7 months ago.

Status:
Closed
Priority:
No priority
Assignee:
Alexander Motin
Category:
OS
Target version:
Seen in:
Severity:
Medium
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

After about a week my install of freenas 11.1u2 will grind to a halt for nfs clients. This has been happening since 11.1.
Attempting to kill or restart the nfs daemon results in the console or ssh session hanging.

This is running against 11.1u4 and it lasted a few hours before locking.

IMG_7732.jpg (2.14 MB) IMG_7732.jpg Mark Guzman, 04/26/2018 05:40 AM
IMG_7775.jpg (1.9 MB) IMG_7775.jpg Mark Guzman, 06/07/2018 11:37 AM
freenas_procstat_20180616.txt (61.6 KB) freenas_procstat_20180616.txt Mark Guzman, 06/16/2018 08:04 AM
textdump-20180616.tar.3.gz (115 KB) textdump-20180616.tar.3.gz Mark Guzman, 06/16/2018 08:04 AM
textdump-20180617.tar.4.gz (117 KB) textdump-20180617.tar.4.gz Mark Guzman, 06/17/2018 02:37 PM
procstat-20180617.txt (67.9 KB) procstat-20180617.txt Mark Guzman, 06/17/2018 02:37 PM
procstat_20180628_3079.txt (61 KB) procstat_20180628_3079.txt Mark Guzman, 06/28/2018 08:50 PM
textdump-20180628.tar.0.gz (139 KB) textdump-20180628.tar.0.gz Mark Guzman, 06/28/2018 08:50 PM
procstat_20180704_3042.txt (54.9 KB) procstat_20180704_3042.txt Mark Guzman, 07/05/2018 07:03 AM
textdump-20180704.tar.1.gz (112 KB) textdump-20180704.tar.1.gz Mark Guzman, 07/05/2018 07:03 AM
procstat_20180706_2992.txt (62.7 KB) procstat_20180706_2992.txt Mark Guzman, 07/06/2018 08:13 AM
textdump-20180706.tar.2.gz (130 KB) textdump-20180706.tar.2.gz Mark Guzman, 07/06/2018 08:13 AM
swap_pager_getswapspace_failed.pdf (87.8 KB) swap_pager_getswapspace_failed.pdf Mark Guzman, 07/13/2018 11:42 PM
textdump-20180718.tar.3.gz (135 KB) textdump-20180718.tar.3.gz Mark Guzman, 07/19/2018 07:03 AM
procstat_20180718_3396.txt (61.9 KB) procstat_20180718_3396.txt Mark Guzman, 07/19/2018 07:03 AM
procstat_20180722_3236.txt (62.6 KB) procstat_20180722_3236.txt Mark Guzman, 07/23/2018 02:29 PM
textdump-20180722.tar.4.gz (131 KB) textdump-20180722.tar.4.gz Mark Guzman, 07/23/2018 02:29 PM
textdump-20180725.tar.0.gz (136 KB) textdump-20180725.tar.0.gz Mark Guzman, 07/25/2018 09:18 PM
fndebug-20180726.tgz (193 KB) fndebug-20180726.tgz Mark Guzman, 07/25/2018 09:18 PM
procstat_20180731_2866.txt (61.2 KB) procstat_20180731_2866.txt Mark Guzman, 07/31/2018 07:35 PM
textdump-20180731.tar.1.gz (127 KB) textdump-20180731.tar.1.gz Mark Guzman, 07/31/2018 07:35 PM
procstat_20180628_3078_2.txt (313 Bytes) procstat_20180628_3078_2.txt Mark Guzman, 07/31/2018 09:20 PM
procstat_20180628_3079.txt (61 KB) procstat_20180628_3079.txt Mark Guzman, 07/31/2018 09:20 PM
procstat_20180628_3078.txt (313 Bytes) procstat_20180628_3078.txt Mark Guzman, 07/31/2018 09:20 PM
textdump-20180628.tar.0.gz (139 KB) textdump-20180628.tar.0.gz Mark Guzman, 07/31/2018 09:20 PM
procstat_20180628_3079_2.txt (61 KB) procstat_20180628_3079_2.txt Mark Guzman, 07/31/2018 09:20 PM
textdump-20180617.tar.4.gz (117 KB) textdump-20180617.tar.4.gz Mark Guzman, 07/31/2018 09:20 PM
procstat-20180617.txt (67.9 KB) procstat-20180617.txt Mark Guzman, 07/31/2018 09:20 PM
fndebug-postboot-20180617.tgz (184 KB) fndebug-postboot-20180617.tgz Mark Guzman, 07/31/2018 09:20 PM
freenas_procstat_20180616.txt (61.6 KB) freenas_procstat_20180616.txt Mark Guzman, 07/31/2018 09:20 PM
textdump-20180616.tar.3.gz (115 KB) textdump-20180616.tar.3.gz Mark Guzman, 07/31/2018 09:20 PM
IMG_7875.jpg (2.01 MB) IMG_7875.jpg Mark Guzman, 08/05/2018 09:49 AM
logs-20180807.tar.xz (59.4 MB) logs-20180807.tar.xz Mark Guzman, 08/06/2018 10:55 PM
Swapping.jpg (123 KB) Swapping.jpg Mark Guzman, 08/06/2018 11:04 PM
final_procstat_20180807.txt (97.8 KB) final_procstat_20180807.txt Mark Guzman, 08/07/2018 12:36 PM
procstat_20180807_2871.txt (61.2 KB) procstat_20180807_2871.txt Mark Guzman, 08/07/2018 09:50 PM
textdump-20180807.tar.2.gz (137 KB) textdump-20180807.tar.2.gz Mark Guzman, 08/07/2018 09:50 PM
procstat_20180813_2869.txt (62.6 KB) procstat_20180813_2869.txt Mark Guzman, 08/13/2018 07:17 PM
procstat_20180812_2871.txt (67.4 KB) procstat_20180812_2871.txt Mark Guzman, 08/13/2018 07:17 PM
textdump-20180813.tar.3.gz (124 KB) textdump-20180813.tar.3.gz Mark Guzman, 08/13/2018 07:17 PM
16989
18289
23728
23832

Related issues

Has duplicate FreeNAS - Bug #31011: nfsd hangs on zfs locksClosed2018-03-30

History

#1 Updated by Mark Guzman over 1 year ago

This has been "generating debug file" for a few minutes. I'm going to give it 10 more minutes before punting.

#2 Updated by Mark Guzman over 1 year ago

Ok, I'm going to give running freenas-debug a shot before I reboot this thing.

#3 Updated by Mark Guzman over 1 year ago

attempting to access the path where I was writing the output from `freenas-debug` resulted in the console hanging. I've rebooted, so we'll see if we managed to capture anything at all.

#4 Updated by Mark Guzman over 1 year ago

no joy. I have a procstat -k output from nfsd and that's it.
that's mostly

 2755 102839 nfsd                nfsd: service       mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig svc_run_internal svc_thread_start fork_exit fork_trampoline
 2755 102840 nfsd                nfsd: service       mi_switch sleepq_wait sleeplk __lockmgr_args lockmgr_lock_fast_path VOP_LOCK1_APV _vn_lock zfs_fhtovp nfsvno_fhtovp nfsd_fhtovp nfsrvd_dorpc nfssvc_pro
gram svc_run_internal svc_thread_start fork_exit fork_trampoline
 2755 102841 nfsd                nfsd: service       mi_switch sleepq_wait sleeplk __lockmgr_args lockmgr_lock_fast_path VOP_LOCK1_APV _vn_lock zfs_fhtovp nfsvno_fhtovp nfsd_fhtovp nfsrvd_dorpc nfssvc_pro
gram svc_run_internal svc_thread_start fork_exit fork_trampoline
 2755 102851 nfsd                nfsd: service       mi_switch sleepq_timedwait _sleep nfsv4_sequencelookup nfsrv_cbcallargs nfsrv_docallback nfsrv_delegconflict nfsrv_opencheck nfsvno_open nfsrvd_open nf
srvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
 2755 102858 nfsd                nfsd: service       mi_switch sleepq_wait sleeplk __lockmgr_args lockmgr_lock_fast_path VOP_LOCK1_APV _vn_lock zfs_lookup zfs_freebsd_lookup VOP_CACHEDLOOKUP_APV vfs_cache
_lookup VOP_LOOKUP_APV nfsrvd_readdirplus nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit

#5 Updated by Mark Guzman over 1 year ago

I can reliably get this to happen within 15 minutes now.

#6 Updated by Mark Guzman over 1 year ago

It looks like my freenas-debug emails made it out. I'll need to wait for my other servers io to catch up before I can upload them. I've reverted to 11.1u2 since that lasted a week before hanging versus the 15-20 minutes for 11.1u4

#7 Updated by Mark Guzman over 1 year ago

  • File fndebug-20180331-1500.tgz added
  • File fndebug-20180331-1505.tgz added
  • File fndebug-20180331-1508.tgz added

Managed to get some debug info. I have one from a few minutes after the machine had booted. One from when my hosts started reporting no response from the freenas nfs server and another from right before I rebooted the machine.

#8 Updated by Dru Lavigne over 1 year ago

  • Has duplicate Bug #31011: nfsd hangs on zfs locks added

#9 Updated by Dru Lavigne over 1 year ago

  • Assignee changed from Release Council to Alexander Motin
  • Target version set to 11.2-RC2
  • Private changed from No to Yes
  • Seen in changed from Unspecified to 11.1-U4

#10 Updated by Mark Guzman over 1 year ago

Is there any other debugging information I can provide when this next occurs? I'm running 11.1u2 again. My assumption is that it's been the locking interplay between nfs and zfs. That seems to be supported by the hang I ran into when accessing my zfs volume locally.

#11 Updated by Mark Guzman over 1 year ago

I ran into this earlier this morning my time (11.1u2). I tried to run freenas-debug -A -e <myemail> but it ended up hanging and ultimately the watchdog rebooted the server.

Is there anything I can run to capture more detail or useful information?

I believe it can be triggered by load. I've noticed that if I mass vmotion in esxi (only compute vmotion) then I seem to edge towards the event. The esxi connections are nfs3 and I believe there's a spike in utilization as the cpu work transitions between esxi hosts.

#12 Updated by Mark Guzman over 1 year ago

Dead again. Running

freenas-debug -A -e [my email] 
resulted in the system rebooting. Since it's the weekend I'll boot into 11.1u4 again to try and capture more information

#13 Updated by Alexander Motin over 1 year ago

  • Assignee changed from Alexander Motin to Benno Rice

#14 Updated by Mark Guzman over 1 year ago

11.1u4 managed to last a week and just failed. Attempting to run freenas-debug resulted in a kernel panic and eventually the watchdog rebooted the machine.

#15 Updated by Nick Wolff over 1 year ago

  • Severity set to Medium

Marking as medium until we look into this more.

#16 Updated by Mark Guzman over 1 year ago

Just happened again. freenas-debug -A -e <my email> again resulted in a panic when I attempted to get any info. Roughly the same locks in use when I checked the nfs daemon

#17 Updated by Mark Guzman over 1 year ago

Nick Wolff wrote:

Marking as medium until we look into this more.

Medium seems a bit light for a behavior that requires manual intervention to resolve and renders the system effectively useless. From what I've seen the locks are held and not released and even local filesystem access will eventually deadlock.

#18 Updated by Benno Rice over 1 year ago

I'd like to try and reproduce this locally. Could you share with me what your clients are running, what their mount options are, etc? Is there a specific behaviour pattern that triggers it?

#19 Updated by Mark Guzman over 1 year ago

It looks like it's load related. I have three esxi servers using nfs v3 mounts for vm storage, three physical linux boxes using nfs v4 mounts for homedir access, a few linux and freebsd vms also using nfs v4 for homedir access.
When this happens I need to restart all of the VMs, if I attempt to do that quickly I have been able to trigger the issue within an hour or two when running 11.1u4

It happens to all nfs clients and eventually local console access also blocks when attempting to access the zfs filesystem that's been shared.

Is there any diagnostic information I can provide? I've been mucking about trying to come up with a dtrace script to monitor these locks, my assumption is that there's a leak.

#20 Updated by Mark Guzman over 1 year ago

Dead again. I had an ssh session open to tmux but was not able to run any commands. The system has rebooted into 11.1u4

#21 Updated by Mark Guzman over 1 year ago

Just one note as to workflow. I had cloned a 30gb VM on the esxi side prior to the system hanging.

#22 Updated by Mark Guzman over 1 year ago

16989

Down again. As usual freenas-debug resulted in a kernel panic and reboot. I took a picture of the screen at the time.
No cloning or major disk activity was happening at the time afaik.

#23 Updated by Mark Guzman over 1 year ago

Down again. Same deal.

#24 Updated by Mark Guzman over 1 year ago

And once again.

#25 Updated by Mark Guzman over 1 year ago

I'm switching back to 11.1u2 for the rest of the week 11.1u4 is just too unstable.

#26 Updated by Mark Guzman over 1 year ago

down again.

#27 Updated by Mark Guzman over 1 year ago

down again. This time I was running some dtrace lock count scripts leading up to it.

#28 Updated by Benno Rice over 1 year ago

Was there anything interesting in the dtrace output?

#29 Updated by Mark Guzman over 1 year ago

I haven't had a chance to go through it yet. I've been running lockstat by name, process etc. Later tonight I'll see if anything jumps out at me. The process calls still look like there's some zfs/nfs lock issue.

#30 Updated by Benno Rice over 1 year ago

I can see at least one nfsd thread sleeping in VOP_LOCK in the debugs you've uploaded so far. That's not hugely significant on its own though, unless it's been sitting there for ages.

What we really need is the kernel panic output from when the condition's occurring. Can you see if there's anything in /data/crash? If not, next time the condition occurs can you force a panic with something like:

# sysctl debug.kdb.panic=1

And then generate a debug dump after it reboots?

Thanks.

#31 Updated by Dru Lavigne over 1 year ago

  • Related to Bug #27859: nfsrv_checksequence - trap 9: general protection fault while in kernel mode added

#32 Updated by Mark Guzman over 1 year ago

  • File freenas-debugging.tar.xz added

It happened again. I triggered a panic and found that I had a few other panic reports. It looks like those times I ran freenas-debug it rebooted on a spinlock panic it managed to write out some dumps.

#33 Updated by Mark Guzman over 1 year ago

I just had a look at the last dump's ddb. That is a lot of nfsv4lck. Looks like a bunch of geli:w as well.

#34 Updated by Benno Rice over 1 year ago

The geli workers are for swap which you're not touching as far as I can see so we can rule that out.

The nfsv4lck stuff is a bit more concerning. Unfortunately the ddb capture buffer wasn't big enough to get the stacks of the nfsd threads so I can't tell exactly what's going on.

Doing a bit of code examination, one possibility is that something has sent a NFSSVC_SUSPENDNFSD RPC. For this to become problematic mountd must have crashed and not been restarted properly. I can see from your debug output that mountd was running in both of the last two text dumps so that seems unlikely. Having the nfsd stacks would also prove or disprove this, and if it isn't this would also help point to where the actual issue is.

Could you run this on your system:

sysctl debug.ddb.capture.bufsize=4194304

And then do the same panic routine as last time when it locks up again? That should hopefully get us the full output. If it doesn't we'll have to up the buffer size again but that's 4x as large as before so hopefully it'll fit.

Thanks!

#35 Updated by Mark Guzman over 1 year ago

I'll add that to the config. I'll see if I can stress it over the weekend.

#36 Updated by Alexander Motin over 1 year ago

  • Related to deleted (Bug #27859: nfsrv_checksequence - trap 9: general protection fault while in kernel mode)

#37 Updated by Mark Guzman over 1 year ago

  • File textdump-20180604.tar.0.gz added

new crash with a much larger debug dump.

#38 Updated by Benno Rice over 1 year ago

Excellent, we've got the stacks we need.

$ grep nfsd: ddb.txt |wc -l
     256
$ grep nfsd: ddb.txt |grep nfsv4lck |wc -l
     199

Well that doesn't seem very happy, and fairly inline with what we've seen in the past.

Here's what all the nfsv4lck threads look like:

Tracing command nfsd pid 2753 tid 103038 td 0xfffff801f0799000
sched_switch() at sched_switch+0x3d0/frame 0xfffffe1251194590
mi_switch() at mi_switch+0xe5/frame 0xfffffe12511945c0
sleepq_wait() at sleepq_wait+0x3a/frame 0xfffffe12511945f0
_sleep() at _sleep+0x237/frame 0xfffffe1251194670
nfsmsleep() at nfsmsleep+0x65/frame 0xfffffe1251194690
nfsv4_lock() at nfsv4_lock+0x9e/frame 0xfffffe12511946d0
nfsrvd_dorpc() at nfsrvd_dorpc+0x300/frame 0xfffffe12511948b0
nfssvc_program() at nfssvc_program+0x5c0/frame 0xfffffe1251194a60
svc_run_internal() at svc_run_internal+0xcc9/frame 0xfffffe1251194ba0
svc_thread_start() at svc_thread_start+0xb/frame 0xfffffe1251194bb0
fork_exit() at fork_exit+0x85/frame 0xfffffe1251194bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe1251194bf0
--- trap 0xc, rip = 0x800871c9a, rsp = 0x7fffffffe518, rbp = 0x7fffffffe940 ---

That implies that we're stuck around here in sys/fs/nfsserver/nfs_nfsdsocket.c:

    NFSLOCKV4ROOTMUTEX();
    if (nfsrv_stablefirst.nsf_flags & NFSNSF_NEEDLOCK)
        igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1, NULL,
            NFSV4ROOTLOCKMUTEXPTR, NULL);
    else
        igotlock = nfsv4_lock(&nfsv4rootfs_lock, 0, NULL,
            NFSV4ROOTLOCKMUTEXPTR, NULL);
    NFSUNLOCKV4ROOTMUTEX();

So something's sitting on the nfsv4 root filesystem lock. What else is going on with the nfsds?

$ grep nfsd: ddb.txt |grep -v nfsv4lck |grep rpcsvc |wc -l
      55

That takes care of most of the rest, and those are all behaving fairly normally waiting for stuff to happen. Let's look at the last two:

102886                   D       zfs      0xfffff80228db75f0 nfsd: service
102900                   D       nfsclseq 0xfffff80fb098c620 nfsd: service

The stacks for these two are:

Tracing command nfsd pid 2753 tid 102886 td 0xfffff801f0732000
sched_switch() at sched_switch+0x3d0/frame 0xfffffe1250e9a340
mi_switch() at mi_switch+0xe5/frame 0xfffffe1250e9a370
sleepq_wait() at sleepq_wait+0x3a/frame 0xfffffe1250e9a3a0
sleeplk() at sleeplk+0x1cd/frame 0xfffffe1250e9a400
__lockmgr_args() at __lockmgr_args+0x4f2/frame 0xfffffe1250e9a490
lockmgr_lock_fast_path() at lockmgr_lock_fast_path+0x149/frame 0xfffffe1250e9a4c0
VOP_LOCK1_APV() at VOP_LOCK1_APV+0x89/frame 0xfffffe1250e9a4f0
_vn_lock() at _vn_lock+0x40/frame 0xfffffe1250e9a550
zfs_fhtovp() at zfs_fhtovp+0x494/frame 0xfffffe1250e9a610
nfsvno_fhtovp() at nfsvno_fhtovp+0x75/frame 0xfffffe1250e9a660
nfsd_fhtovp() at nfsd_fhtovp+0xa7/frame 0xfffffe1250e9a6c0
nfsrvd_dorpc() at nfsrvd_dorpc+0x94c/frame 0xfffffe1250e9a8b0
nfssvc_program() at nfssvc_program+0x5c0/frame 0xfffffe1250e9aa60
svc_run_internal() at svc_run_internal+0xcc9/frame 0xfffffe1250e9aba0
svc_thread_start() at svc_thread_start+0xb/frame 0xfffffe1250e9abb0
fork_exit() at fork_exit+0x85/frame 0xfffffe1250e9abf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe1250e9abf0
--- trap 0xc, rip = 0x800871c9a, rsp = 0x7fffffffe518, rbp = 0x7fffffffe940 ---
Tracing command nfsd pid 2753 tid 102900 td 0xfffff801f0734000
sched_switch() at sched_switch+0x3d0/frame 0xfffffe1250edff80
mi_switch() at mi_switch+0xe5/frame 0xfffffe1250edffb0
sleepq_timedwait() at sleepq_timedwait+0x42/frame 0xfffffe1250edfff0
_sleep() at _sleep+0x217/frame 0xfffffe1250ee0070
nfsv4_sequencelookup() at nfsv4_sequencelookup+0x101/frame 0xfffffe1250ee00e0
nfsrv_cbcallargs() at nfsrv_cbcallargs+0xdd/frame 0xfffffe1250ee0140
nfsrv_docallback() at nfsrv_docallback+0x240/frame 0xfffffe1250ee02c0
nfsrv_delegconflict() at nfsrv_delegconflict+0x18e/frame 0xfffffe1250ee0340
nfsrv_checkremove() at nfsrv_checkremove+0x12f/frame 0xfffffe1250ee03c0
nfsvno_removesub() at nfsvno_removesub+0x37/frame 0xfffffe1250ee0410
nfsrvd_remove() at nfsrvd_remove+0x1b6/frame 0xfffffe1250ee06d0
nfsrvd_dorpc() at nfsrvd_dorpc+0xfff/frame 0xfffffe1250ee08b0
nfssvc_program() at nfssvc_program+0x5c0/frame 0xfffffe1250ee0a60
svc_run_internal() at svc_run_internal+0xcc9/frame 0xfffffe1250ee0ba0
svc_thread_start() at svc_thread_start+0xb/frame 0xfffffe1250ee0bb0
fork_exit() at fork_exit+0x85/frame 0xfffffe1250ee0bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe1250ee0bf0
--- trap 0xc, rip = 0x800871c9a, rsp = 0x7fffffffe518, rbp = 0x7fffffffe940 ---

Now both of these are in a path where they may have the lock but we don't know how long they've been sitting on it. I'm going to do some more digging in to what could be going on here.

#39 Updated by Mark Guzman over 1 year ago

Nice! Let me know if there's any way I can help dig further.

#40 Updated by Mark Guzman over 1 year ago

  • File textdump-20180605.tar.1.gz added

Had another one yesterday.

#41 Updated by Mark Guzman over 1 year ago

The new dump shows a lot more nfsd holding zfs locks, so hopefully that narrows down the codepaths a bit more.

#42 Updated by Mark Guzman over 1 year ago

  • File textdump-20180607.tar.2.gz added

Whatever I'm doing it seems to be happening quickly this week.

#43 Updated by Benno Rice over 1 year ago

Ok, so this is what we expect:

$ grep nfsd: ddb-20180605.txt |wc -l
     256
$ grep nfsd: ddb-20180607.txt |wc -l
     256

But this is very interesting:

$ grep nfsd: ddb-20180607.txt |grep nfsv4lck |wc -l
     255

That's all bar one. Which one isn't in this state?

$ grep nfsd: ddb-20180607.txt |grep -v nfsv4lck
102844                   D       nfsclseq 0xfffff8013c454620 nfsd: service

Hello.

Tracing command nfsd pid 2769 tid 102844 td 0xfffff80206d5b000
sched_switch() at sched_switch+0x3d0/frame 0xfffffe12509bfd50
mi_switch() at mi_switch+0xe5/frame 0xfffffe12509bfd80
sleepq_timedwait() at sleepq_timedwait+0x42/frame 0xfffffe12509bfdc0
_sleep() at _sleep+0x217/frame 0xfffffe12509bfe40
nfsv4_sequencelookup() at nfsv4_sequencelookup+0x101/frame 0xfffffe12509bfeb0
nfsrv_cbcallargs() at nfsrv_cbcallargs+0xdd/frame 0xfffffe12509bff10
nfsrv_docallback() at nfsrv_docallback+0x240/frame 0xfffffe12509c0090
nfsrv_delegconflict() at nfsrv_delegconflict+0x18e/frame 0xfffffe12509c0110
nfsrv_opencheck() at nfsrv_opencheck+0x4e1/frame 0xfffffe12509c01c0
nfsvno_open() at nfsvno_open+0x38e/frame 0xfffffe12509c0260
nfsrvd_open() at nfsrvd_open+0xba6/frame 0xfffffe12509c06c0
nfsrvd_dorpc() at nfsrvd_dorpc+0xa84/frame 0xfffffe12509c08b0
nfssvc_program() at nfssvc_program+0x5c0/frame 0xfffffe12509c0a60
svc_run_internal() at svc_run_internal+0xcc9/frame 0xfffffe12509c0ba0
svc_thread_start() at svc_thread_start+0xb/frame 0xfffffe12509c0bb0
fork_exit() at fork_exit+0x85/frame 0xfffffe12509c0bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe12509c0bf0
--- trap 0xc, rip = 0x800871c9a, rsp = 0x7fffffffe518, rbp = 0x7fffffffe940 ---

Next time this happens can you take a close look at what I suspect will be a single nfsd in the nfsclseq state? A bunch of `procstat -kk` runs over a period of time would be handy. What I'm looking for is whether it's sitting in this state for an extended period which would imply that it's stuck in some way, presumably holding the nfsv4 root filesystem mutex. While we're waiting for that confirmation I'll see if. can work out how it'd get into this state. Rick Macklem recently committed some stuff to do with leakage of sessions which might be worth checking out.

#44 Updated by Benno Rice over 1 year ago

Does the description in this commit sound similar to what's going on when the problem hits?

https://svnweb.freebsd.org/base?view=revision&revision=334739

#45 Updated by Mark Guzman over 1 year ago

That may well be it. It sounds very much like the issue and I do have linux clients using nfs 4

#46 Updated by Benno Rice over 1 year ago

I'm pulling this fix into our local FreeBSD repo:

https://github.com/freenas/os/pull/124

I'll let you know when it'll be in a nightly if you're willing to give that a go?

#47 Updated by Benno Rice over 1 year ago

  • Subject changed from nfsd hangs on zfs locks to nfsd hangs in nfsv4_sequencelookup

#48 Updated by Mark Guzman over 1 year ago

18289

Just happened again. I tried to ssh in and it just hung. Attempting to write to the fs to run `procstat -kk {nfsdpid} > file` resulted in the console session hanging. I took a picture with my phone. I checked both nfsds. The nfsrv_setclient seems.. suspect given the issue you linked.

#49 Updated by Mark Guzman over 1 year ago

I'm willing to try it. Right now this is pretty much rolling the dice as to when it'll hang.

#50 Updated by Benno Rice over 1 year ago

Actually I see a call to nfsrvd_exchangeid in there which directly points at the case that that commit fixes. I'll let you know when the nightly's out.

#51 Updated by Mark Guzman over 1 year ago

Fantastic! Thanks. This is making my day, I'll be very happy to return to the days of stability.

#52 Updated by Dru Lavigne over 1 year ago

  • Subject changed from nfsd hangs in nfsv4_sequencelookup to Bring in upstream fix for NFSv4 session leak
  • Status changed from Unscreened to In Progress
  • Target version changed from 11.2-RC2 to 11.2-BETA1
  • Needs Doc changed from Yes to No

#53 Updated by Dru Lavigne over 1 year ago

  • File deleted (fndebug-20180331-1505.tgz)

#54 Updated by Dru Lavigne over 1 year ago

  • File deleted (fndebug-20180331-1500.tgz)

#55 Updated by Dru Lavigne over 1 year ago

  • File deleted (fndebug-20180331-1508.tgz)

#56 Updated by Dru Lavigne over 1 year ago

  • File deleted (freenas-debugging.tar.xz)

#57 Updated by Dru Lavigne over 1 year ago

  • File deleted (textdump-20180604.tar.0.gz)

#58 Updated by Dru Lavigne over 1 year ago

  • File deleted (textdump-20180605.tar.1.gz)

#59 Updated by Dru Lavigne over 1 year ago

  • File deleted (textdump-20180607.tar.2.gz)

#60 Updated by Dru Lavigne over 1 year ago

  • Private changed from Yes to No

#61 Updated by Dru Lavigne over 1 year ago

  • Status changed from In Progress to Ready for Testing
  • Needs Merging changed from Yes to No

#62 Updated by Mark Guzman over 1 year ago

I've been running the latest nightly for a few hours and it appears to have happened again. Had the change made it into the nightlies?
I'll attach procstat and kernel dump info once it completes.

#63 Updated by Mark Guzman over 1 year ago

Latest dump and procstat -kk output. It's worth noting that I was able to write to the filesystem which is rare when this happens.

#64 Updated by Mark Guzman over 1 year ago

Dead again. Crash dump shortly.

#66 Updated by Dru Lavigne over 1 year ago

  • Status changed from Ready for Testing to In Progress
  • Assignee changed from Benno Rice to Alexander Motin

Sasha: is https://github.com/freenas/os/pull/124 ready to be pulled into master?

#67 Updated by Alexander Motin over 1 year ago

  • Status changed from In Progress to Screened
  • Target version changed from 11.2-BETA1 to 11.2-RC2

Yes, as I see the patch should already be integrated into the nightlies. In the last debug I see version "854e3946a(freenas/11-stable): Fri Jun 15 01:36:22", which according to git log includes both of related NFS patches. So it seems like the problem is not fixed.

Looking on the last debug, I actually see there many NFS threads waiting on ZFS ZIL commit, so the server does not look like dead, more like busy. Though 3 threads are indeed waiting on nfsclseq same as before, so may be there is still some problem.

In debug before the last NFS server is much more blocked, and without ZFS this time, one thread is waiting on "nfsclseq" same as before.

#68 Updated by Mark Guzman over 1 year ago

Both dumps were triggered after the rest of the network was waiting for ~20 minutes. I think the zfs stuff gets caught up in the nfs issue as I've seen the console hang on writing to zfs. My guess when that happens is one of these hung nfs processes owns a zfs lock when it starts deadlocking. I have seen a few of my early procstats with zfs locks. In recent dumps you may see iscsi at play as well since I'm trying to keep some core set of things functional.

#69 Updated by Benno Rice over 1 year ago

Yeah, looking at the first dump it looks very similar to the previous ones. The second one is a bit different as there are more threads in nfsclseq. The stacks for those threads look very similar to the threads we've seen stuck in nfsclseq previously.

One question: Are you using NFSv4.1 with ESXi?

#70 Updated by Mark Guzman over 1 year ago

It's only NFSv3 from esxi. I'd encountered reliability issues early on when using NFSv4 connections. I found that the datastore would suddenly disappear (I'm guessing this is related to the bug that was fixed).

The only NFSv4 clients are linux and freebsd boxes.

Example output from

mount
on one of the linux boxes (kernel 4.17.2):
freenas:/mnt/ark/homes on /mnt/ark-homes type nfs4 (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=2,retrans=2,sec=sys,clientaddr=10.0.0.100,local_lock=none,addr=10.0.0.105,_netdev)

The majority of the non-esxi nfs traffic will be homedirs and docker mounts.

#71 Updated by Mark Guzman over 1 year ago

procstat and dump from today's hang.

#73 Updated by Alexander Motin over 1 year ago

  • Assignee changed from Alexander Motin to John Hixson

#75 Updated by Mark Guzman over 1 year ago

I've moved my vm images over to iscsi so we'll see if pure linux/freebsd access in my environment can trigger it.

#76 Updated by John Hixson over 1 year ago

Mark Guzman wrote:

I've moved my vm images over to iscsi so we'll see if pure linux/freebsd access in my environment can trigger it.

Hi Mark, I know you've done a lot of bouncing around with this, but since I'm now the owner of this ticket, I'd like to request if you could upgrade to 11.2BETA and see if you encounter the same behavior. I also see there are no debugs in this ticket, would it be possible to get one so I can at least see the state of your system when things have gone wrong?

#77 Updated by John Hixson over 1 year ago

  • Category changed from OS to Services

#78 Updated by Dru Lavigne over 1 year ago

  • Status changed from Screened to Blocked
  • Reason for Blocked set to Need additional information from Author

#79 Updated by Mark Guzman over 1 year ago

Hi John, the best I can do when this happens is create a dump. Attempting to run a full freenas-debug -A has always resulted in an eventual panic with no message or debug output being written/saved.
I'm currently running a nightly from last week and the situation didn't improve. I had managed early on to run subsets of freenas-debug without issue. I believe those were deleted from the ticket when we thought there was a solution. I believe I have copies of all of the debug information I've sent along so far if that helps. Heck there may be locks lingering right now since I booted with the vsphere cluster still using nfs and gradually migrated to the new iscsi volume. Let me know what you'd like me to do. I can update but I don't see much new in the nightly changelog beyond ui work.

#80 Updated by Mark Guzman over 1 year ago

aaand it's dead. I can't login to run anything remotely or via a local terminal. I eventually had to reboot. there were many messages on the main console about a swap issue.
it's worth noting that I had just spun up a vm that was stored on nfs in order to recover some old data.

I'm switching to the latest nightly.

#81 Updated by Dru Lavigne over 1 year ago

  • Status changed from Blocked to Unscreened
  • Reason for Blocked deleted (Need additional information from Author)

#82 Updated by Mark Guzman over 1 year ago

There was no crash dump from the last hang. Let me know if there's other info that you'd find useful.

#83 Updated by Dru Lavigne over 1 year ago

  • Target version changed from 11.2-RC2 to 11.2-BETA3

#84 Updated by Mark Guzman over 1 year ago

I have this hung live right now with a shell in. I'll trigger a dump shortly after gathering info. I'll give this 10 minutes to see if there's anything else you'd like.

#85 Updated by John Hixson over 1 year ago

  • Status changed from Unscreened to Screened

There are few more patches available from HEAD regarding NFS/VMware issues. I'll review them and merge them if they look like they'll help. I'll keep you posted.

#86 Updated by Mark Guzman over 1 year ago

Here's the dump data from that last hang.

#89 Updated by John Hixson over 1 year ago

  • Assignee changed from John Hixson to Timur Bakeyev

#91 Updated by Timur Bakeyev over 1 year ago

Hi, Mark!

As the old debug logs were deleted, can you, please, generate a new one for us? No need to wait for the system lockup, just do it in the FreeNAS version you are using now.

In general, from the textdump I got impression that you are running a lot of staff on your box, besides it being NFS server for ESXi and Linux and FreeBSD boxes. Obviously, more staff you run, more chances that something would lock up and freeze your box...

Well, let's see again the debug log for the start.

#92 Updated by Mark Guzman over 1 year ago

Hi Timur, I just ran `freenas-debug -A -e <my address>` and it led to a panic. I can re-upload the old ones. I'm happy to help hunt down the nfs deadlock that I'm apparently repeatedly running into even if it's just narrowing down the workload. Currently the ESXI cluster is using iscsi instead of nfs, so we're only seeing nfs mount from linux and freebsd boxes.

#93 Updated by Mark Guzman over 1 year ago

To be clear about the original issue, eventually nfsd stops responding and can't be killed. Running freenas-debug results in a panic once some variant of that state is achieved. This has randomly impacted zfs when the nfsd has held zfs locks. So it's possible for it to hang the entire system, but generally it just means nfs stops doing its job.

#94 Updated by Mark Guzman over 1 year ago

dump and post-reboot debug

#95 Updated by Mark Guzman over 1 year ago

I have it dead right now. I'm going to give it ~10 minutes before I trigger a dump to reboot the system. Let me know if theres anything I can provide.

#96 Updated by Timur Bakeyev over 1 year ago

Seems I missed that 10 min frame...

The box doesn't respond to anything? If you have access to console - Ctrl-T doesn't give you anything?

I'm not sure which kernel do you run now, can you switch to the debug one?

In general, although all symptoms are around the NFS, it could be that it's only the sight of a locked up disk/ZFS. Next question is how to distinct that...

you said you are using NFSv4 for Linux/FreeBSD clients only. Can you, by any chance to switch them to NFSv3 instead, so you won't have any NFSv4 clients? (For ESXi you already use NFSv3, right?).

#97 Updated by Timur Bakeyev over 1 year ago

As a general request - keep sending procstat -kkk and textdumps.

#98 Updated by Timur Bakeyev over 1 year ago

I'm trying to understand your disks setup. What a USB drive is used for?

Rest of the disks are connected through the SAS expander? You don't experience timeouts accessing them? No messages on the screen?

In general, we do miss debug logs, with all the /var/log copies in it, etc.

Did I understand you correctly, that now on a perfectly fine running system, when you run the debug log program the box freezes and hungs almost 100%?

#99 Updated by Mark Guzman over 1 year ago

The usb drive is the boot drive. The dell R710 has an internal usb connection for boot media, so freenas goes on that. everything else is via the internal sas backplane or the compellent shelf.

In general, we do miss debug logs, with all the /var/log copies in it, etc.

depending on how bad the hang is I can probably copy those to /data/crash. Attempting to write to the volume shared via nfs is an iffy proposition given the locking issues. I've tried writing to the nfs shared volume and have hung the box or console sessions.

I'm happy to switch to whatever kernel may help. I'm currently running a nightly build as of Saturday. I updated preemptively to see if I could have a reasonable uptime for the week, unfortunately that plan did not pan out.

you said you are using NFSv4 for Linux/FreeBSD clients only. Can you, by any chance to switch them to NFSv3 instead, so you won't have any NFSv4 clients? (For ESXi you already use NFSv3, right?).

This will take me a bit. I'll see what I can do. ESXi is using iSCSI exclusively for VM storage at this point.

In general, although all symptoms are around the NFS, it could be that it's only the sight of a locked up disk/ZFS. Next question is how to distinct that...

I have seen combination hangs, where the filesystem that's been shared is locked, but the procstat and dump output showed the zfs lock being held by nfsd.

As a general request - keep sending procstat -kkk and textdumps.

will do. let me know if there's anything else I can run or should do.

Did I understand you correctly, that now on a perfectly fine running system, when you run the debug log program the box freezes and hungs almost 100%?

I have a window of time before running freenas-debug will cause a panic. I can run a subset of the commands but running freenas-debug -A will reliably cause a panic when we've started leaking locks somewhere.

#100 Updated by Timur Bakeyev over 1 year ago

Mark Guzman wrote:

In general, we do miss debug logs, with all the /var/log copies in it, etc.

depending on how bad the hang is I can probably copy those to /data/crash. Attempting to write to the volume shared via nfs is an iffy proposition given the locking issues. I've tried writing to the nfs shared volume and have hung the box or console sessions.

In worse case you can scp them to another box. Reading should be less harmful from the locks prospective...

I'm happy to switch to whatever kernel may help. I'm currently running a nightly build as of Saturday. I updated preemptively to see if I could have a reasonable uptime for the week, unfortunately that plan did not pan out.

Any version of FN allows you to switch to the same, but different debug version of the kernel, with some helper routines in it. That may slow down kernel a bit, but shouldn't be too significant.

You can make such a switch via System -> Advanced -> Enable debug kernel.

you said you are using NFSv4 for Linux/FreeBSD clients only. Can you, by any chance to switch them to NFSv3 instead, so you won't have any NFSv4 clients? (For ESXi you already use NFSv3, right?).

This will take me a bit. I'll see what I can do. ESXi is using iSCSI exclusively for VM storage at this point.

iSCSI for ESXi is good, at least in this case, my idea was to exclude NFSv4 from the picture completely, if possible and see, if that would improve the situation. So, if your regular OS clients can use NFSv3 meanwhile, that would be helpful to verify that we are facing exactly with the NFSv4 root mutex lock.

In general, although all symptoms are around the NFS, it could be that it's only the sight of a locked up disk/ZFS. Next question is how to distinct that...

I have seen combination hangs, where the filesystem that's been shared is locked, but the procstat and dump output showed the zfs lock being held by nfsd.

Maybe I missed that particular logs, I didn't see that ZFS is locked by NFS and that would be difficult to achieve as far as I know...

As a general request - keep sending procstat -kkk and textdumps.

will do. let me know if there's anything else I can run or should do.

Did I understand you correctly, that now on a perfectly fine running system, when you run the debug log program the box freezes and hungs almost 100%?

I have a window of time before running freenas-debug will cause a panic. I can run a subset of the commands but running freenas-debug -A will reliably cause a panic when we've started leaking locks somewhere.

This is a shell script, so in theory you can from the remote machine in the script session:

# script fn.log
# ssh FN.host
# sh -Tx /usr/local/bin/freenas-debug -A

And see, on which test it'll hang... You can periodically press Ctrl-T when the script doesn't produce output on the console to see, what is it doing and is it alive or not.

That's pretty harsh, if you know that that will hang your machine, but would help to identify, which test is actually causing that.

#102 Updated by Mark Guzman over 1 year ago

23728

I just switched to the latest nightly and tried to use the debug kernel. It would panic on boot.

#103 Updated by Timur Bakeyev over 1 year ago

That's not good, of course, but nightly is known to be fragile and may not work at some points in time.

Can you try 11.2b2 instead, that should be more stable and still recent enough?

#104 Updated by Mark Guzman over 1 year ago

Will do next time I have a reboot or window.

#105 Updated by Timur Bakeyev over 1 year ago

Mark Guzman wrote:

I just switched to the latest nightly and tried to use the debug kernel. It would panic on boot.

Which version exactly did you use? 11.1-nightly or 11.2-nightly? And for which date?

#106 Updated by Mark Guzman over 1 year ago

FreeNAS-11.2-MASTER-201808050839

(Build Date: Aug 5, 2018 11:49)

#107 Updated by Timur Bakeyev over 1 year ago

Hi, Mark!

I'm still waiting for the content of your /var/log directory. The fact that at least your last crash was connected with running out of swap space makes me think that you may experience lockups due heavy swapping of your system. Logs can help to clarify if this idea correct or not.

#108 Updated by Mark Guzman over 1 year ago

ok, I'll pack that up and send it over later today

#109 Updated by Mark Guzman over 1 year ago

23832

Here are the logs. I added swap to my grafana charts for the system and it looks like I've only started seeing the swap behavior recently.
If you have any key words I have a longer log history in my elk setup.

#110 Updated by Mark Guzman over 1 year ago

Based on monitoring the usage with htop, I'd hazard to guess we're seeing idle processes being swapped out in favor of filesystem cache space.

#111 Updated by Mark Guzman over 1 year ago

Soooo. freenas-debug died when trying to run a procstat.

#112 Updated by Mark Guzman over 1 year ago

dump from the freenas-debug hang

#113 Updated by Mark Guzman about 1 year ago

I've had a hang every day for the past 3 days. I couldn't get any type of data from one day. When the system hung on Aug-11 I switched to the latest nightly. Since then I have seen daily hangs.

#114 Updated by Timur Bakeyev about 1 year ago

Mark, did you manage to get debug kernel working?

#115 Updated by Mark Guzman about 1 year ago

No I didn't but I wasn't trying too hard. I was offsite running a conference for a charity. This weekend I'll try again.

#116 Updated by Mark Guzman about 1 year ago

Died again. I tried running procstat -akk and it led to a panic in an _mtx_lock_indefinite_check.
Interestingly, nginx was dead when I went to check the ui and possibly switch to the debug kernel.
I also checked to see if I could switch to the debug kernel from the boot loader but it was not listed.

#117 Updated by Dru Lavigne about 1 year ago

  • Target version changed from 11.2-BETA3 to 11.3

#118 Updated by Mark Guzman about 1 year ago

Another one, this time I've rebooted into a debug kernel (hopefully). I generated a subset of the freenas-debug output.

#119 Updated by Dru Lavigne about 1 year ago

  • Assignee changed from Timur Bakeyev to John Hixson

#120 Updated by Mark Guzman about 1 year ago

This happened again a day ago. The crash dump did was not written out so I only have procstat output which looks like all the others.

#121 Updated by Dru Lavigne about 1 year ago

  • Assignee changed from John Hixson to William Grzybowski

#122 Updated by William Grzybowski about 1 year ago

  • Category changed from Services to OS
  • Status changed from Screened to Unscreened
  • Assignee changed from William Grzybowski to Alexander Motin

Alexander, over for your team.

#123 Updated by Alexander Motin about 1 year ago

  • Assignee changed from Alexander Motin to Sean Fagan

#124 Updated by Sean Fagan about 1 year ago

Still not quite caught up on this. It appears to be a hang in the NFS code, in nfsv4_sequencelookup?

#125 Updated by Sean Fagan about 1 year ago

The swap usage looks like a process is dying pretty regularly.

#126 Updated by Sean Fagan 10 months ago

  • Status changed from Unscreened to Screened

About the messages about the session not being found: I asked Rick Macklem about this error message, and he says "it's not serious" (for the server). He also added, "What it probably indicates is a broken NFSv4.1 client. (I'll bet on ESXi, since I've never seen a problem like this for Linux.)"

For the rest of it, this is still a very wide-ranging ticket, covering multiple issues, and the discussions above aren't at all clear what the problem really is. A panic in netisr_init is not going to be due to NFS.

The various procstats do show nfs-related waits, and I'm not sure what it's waiting for. But I need to ensure that's the problem, and not the vague "Bring in upstream fix for NFSv4 session leak.". (That change is already in FreeNAS; also note that there is a pull-request related to getting our NFS code more in sync with FreeBSD, but I don't know if that'd fix this problem.)

#127 Updated by Dru Lavigne 8 months ago

  • Assignee changed from Sean Fagan to Alexander Motin

#128 Avatar?id=14398&size=24x24 Updated by Kris Moore 7 months ago

  • Status changed from Screened to Closed

Also available in: Atom PDF