Project

General

Profile

Bug #7962

Unlocking drives and restarting services hangs

Added by Aaron C de Bruyn over 5 years ago. Updated about 3 years ago.

Status:
Closed: Cannot reproduce
Priority:
Nice to have
Assignee:
Josh Paetzel
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Installed the latest ISO last night, made sure I had all the updates just a few minutes ago, then created my encrypted pools and various datasets.

I had to power the box down as part of moving it into a rack. When I brought it back up and attempted to unlock the first pool, it hanged at restarting Samba (I think). It sat there for 15 minutes before I manually refreshed the GUI and found some of my services still stopped. I rebooted and tried to unlock the pool again, same issue.

Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Device gptid/a82f320e-b209-11e4-9e0d-0cc47a42f122.eli created.
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Encryption: AES-XTS 128
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI:     Crypto: hardware
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Device gptid/a88b01c0-b209-11e4-9e0d-0cc47a42f122.eli created.
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Encryption: AES-XTS 128
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI:     Crypto: hardware
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Device gptid/a8d7ea99-b209-11e4-9e0d-0cc47a42f122.eli created.
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Encryption: AES-XTS 128
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI:     Crypto: hardware
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Device gptid/a9258c5e-b209-11e4-9e0d-0cc47a42f122.eli created.
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Encryption: AES-XTS 128
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI:     Crypto: hardware
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Device gptid/a97f98bc-b209-11e4-9e0d-0cc47a42f122.eli created.
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Encryption: AES-XTS 128
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI:     Crypto: hardware
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Device gptid/a9e578e2-b209-11e4-9e0d-0cc47a42f122.eli created.
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Encryption: AES-XTS 128
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI:     Crypto: hardware
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Device gptid/aa4ee408-b209-11e4-9e0d-0cc47a42f122.eli created.
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Encryption: AES-XTS 128
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI:     Crypto: hardware
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Device gptid/ab512fda-b209-11e4-9e0d-0cc47a42f122.eli created.
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI: Encryption: AES-XTS 128
Feb 11 09:06:01 usolmsdnas01 GEOM_ELI:     Crypto: hardware
Feb 11 09:06:06 usolmsdnas01 notifier: Stopping collectd.
Feb 11 09:06:08 usolmsdnas01 notifier: Waiting for PIDS: 2890.
Feb 11 09:06:08 usolmsdnas01 notifier: Starting collectd.
Feb 11 09:06:08 usolmsdnas01 notifier: netatalk not running?
Feb 11 09:06:09 usolmsdnas01 notifier: Will not 'start' netatalk because netatalk_enable is NO.
Feb 11 09:06:11 usolmsdnas01 generate_smb4_conf.py: [common.pipesubr:58] Popen()ing: /sbin/sysctl -n 'kern.maxfilesperproc'
Feb 11 09:06:11 usolmsdnas01 generate_smb4_conf.py: [common.pipesubr:58] Popen()ing: /usr/local/bin/pdbedit -d 0 -i smbpasswd:/tmp/tmpu2ZMxW -s /usr/local/etc/smb4.conf -e tdbsam:/var/etc/private/passdb.tdb
Feb 11 09:06:11 usolmsdnas01 generate_smb4_conf.py: [common.pipesubr:58] Popen()ing: /usr/local/bin/pdbedit -L
Feb 11 09:06:12 usolmsdnas01 notifier: Stopping winbindd.
Feb 11 09:06:12 usolmsdnas01 winbindd[2571]:   STATUS=daemon 'winbindd' finished starting up and ready to serve connectionsGot sig[15] terminate (is_parent=1)
Feb 11 09:06:12 usolmsdnas01 winbindd[2575]:   STATUS=daemon 'winbindd' finished starting up and ready to serve connectionsGot sig[15] terminate (is_parent=0)
Feb 11 09:06:12 usolmsdnas01 notifier: Waiting for PIDS: 2571.
Feb 11 09:06:12 usolmsdnas01 notifier: Stopping smbd.
Feb 11 09:06:13 usolmsdnas01 notifier: Waiting for PIDS: 2568, 2568.
Feb 11 09:06:13 usolmsdnas01 notifier: Stopping nmbd.
Feb 11 09:06:13 usolmsdnas01 nmbd[2565]: [2015/02/11 09:06:13.320277,  0] ../source3/nmbd/nmbd.c:57(terminate)
Feb 11 09:06:13 usolmsdnas01 nmbd[2565]:   Got SIGTERM: going down...
Feb 11 09:06:13 usolmsdnas01 notifier: Waiting for PIDS: 2565.
Feb 11 09:06:13 usolmsdnas01 notifier: Performing sanity check on Samba configuration: OK
Feb 11 09:06:13 usolmsdnas01 notifier: winbindd not running? (check /var/run/samba/winbindd.pid).
Feb 11 09:06:13 usolmsdnas01 notifier: smbd not running? (check /var/run/samba/smbd.pid).
Feb 11 09:06:13 usolmsdnas01 notifier: nmbd not running? (check /var/run/samba/nmbd.pid).
Feb 11 09:06:13 usolmsdnas01 notifier: Performing sanity check on Samba configuration: OK
Feb 11 09:06:13 usolmsdnas01 notifier: Starting nmbd.
Feb 11 09:06:13 usolmsdnas01 notifier: Starting smbd.
Feb 11 09:06:13 usolmsdnas01 nmbd[4260]: [2015/02/11 09:06:13.519486,  0] ../lib/util/become_daemon.c:136(daemon_ready)
Feb 11 09:06:13 usolmsdnas01 notifier: Starting winbindd.
Feb 11 09:06:13 usolmsdnas01 smbd[4264]: [2015/02/11 09:06:13.561125,  0] ../lib/util/become_daemon.c:136(daemon_ready)
Feb 11 09:06:13 usolmsdnas01 winbindd[4268]: [2015/02/11 09:06:13.584293,  0] ../source3/winbindd/winbindd_cache.c:3196(initialize_winbindd_cache)
Feb 11 09:06:13 usolmsdnas01 winbindd[4268]:   initialize_winbindd_cache: clearing cache and re-creating with version number 2
Feb 11 09:06:13 usolmsdnas01 winbindd[4268]: [2015/02/11 09:06:13.586105,  0] ../lib/util/become_daemon.c:136(daemon_ready)
Feb 11 09:06:13 usolmsdnas01 notifier: proftpd not running?
Feb 11 09:06:13 usolmsdnas01 notifier: Will not 'start' proftpd because proftpd_enable is NO.
Feb 11 09:06:18 usolmsdnas01 notifier: Stopping ctld.
Feb 11 09:06:18 usolmsdnas01 ctld[1933]: exiting on signal
Feb 11 09:06:18 usolmsdnas01 notifier: Waiting for PIDS: 1933.
Feb 11 09:06:18 usolmsdnas01 notifier: ctld not running? (check /var/run/ctld.pid).
Feb 11 09:06:18 usolmsdnas01 notifier: Starting ctld.
Feb 11 09:06:18 usolmsdnas01 notifier: ctld: portal-group "pg1" not assigned to any target
Feb 11 09:06:18 usolmsdnas01 ctld[4445]: portal-group "pg1" not assigned to any target
Feb 11 09:06:18 usolmsdnas01 ctld[4445]: portal-group "pg2" not assigned to any target
Feb 11 09:06:18 usolmsdnas01 notifier: ctld: portal-group "pg2" not assigned to any target
Feb 11 09:06:18 usolmsdnas01 notifier: Stopping lockd.
Feb 11 09:06:18 usolmsdnas01 notifier: Waiting for PIDS: 2038.
Feb 11 09:06:19 usolmsdnas01 notifier: Stopping statd.
Feb 11 09:06:19 usolmsdnas01 notifier: Waiting for PIDS: 2027.
Feb 11 09:06:19 usolmsdnas01 notifier: Stopping nfsd.
Feb 11 09:06:36 usolmsdnas01 nmbd[4260]:   STATUS=daemon 'nmbd' finished starting up and ready to serve connections*****
Feb 11 09:06:36 usolmsdnas01 nmbd[4260]:   
Feb 11 09:06:36 usolmsdnas01 nmbd[4260]:   Samba name server USOLMSDNAS01 is now a local master browser for workgroup SMILES on subnet 10.142.40.240
Feb 11 09:06:36 usolmsdnas01 nmbd[4260]:   
Feb 11 09:06:36 usolmsdnas01 nmbd[4260]:   *****

The workaround is to wait for the drives to unlock, and the restart process to hang, then refresh the web interface, go into services, and manually restart or start services by hand.

History

#1 Updated by Aaron C de Bruyn over 5 years ago

Sorry--that was not the correct screenshot... ;)

#3 Updated by Aaron C de Bruyn over 5 years ago

After more playing, it looks like it might be related to NFS. From the GUI, I am able to stop and start all the services I use, except NFS. It sits there trying to stop for at least 2 minutes.

From the CLI if I run /etc/rc.d/nfsd stop it says "stopping nfsd. Waiting for PIDS: 2016" and hangs.

load: 0.10  cmd: pwait 9002 [kqread] 35.65r 0.00u 0.00s 0% 1564k

Running kill -9 2016 doesn't kill the process.

#4 Updated by Aaron C de Bruyn over 5 years ago

[root@usolmsdnas01] ~# procstat -kk 2016
  PID    TID COMM             TDNAME           KSTACK                       
 2016 100558 nfsd             -                mi_switch+0x194 sleepq_timedwait+0x42 _sleep+0x1c9 clnt_reconnect_call+0x79e clnt_call_private+0xe8 gssd_acquire_cred_1+0x47 gss_acquire_cred+0xa4 rpc_gss_acquire_svc_cred+0xa3 rpc_gss_set_svc_name+0xa7 nfsrvd_nfsd+0xd9 nfssvc_nfsd+0x7b sys_nfssvc+0x90 amd64_syscall+0x5ea Xfast_syscall+0xf7 
[root@usolmsdnas01] ~# 

#5 Updated by Aaron C de Bruyn over 5 years ago

It also hangs after rebooting, immediately logging in to the GUI, and attempting to stop it without trying to unlock the drives first.

#6 Updated by Jordan Hubbard over 5 years ago

  • Category set to 92
  • Assignee set to Josh Paetzel
  • Target version set to Unspecified

#7 Updated by Josh Paetzel over 5 years ago

  • Status changed from Unscreened to Screened

Can you find the part in your logfile where nfsd is being stopped? It looks like nfsd is getting stuck on gssd, which is a symptom of gssd getting killed underneath nfsd,

Do you have NFSv4 enabled?

#8 Updated by Aaron C de Bruyn over 5 years ago

Yes, NFSv4 is enabled.

I don't have logs going back far enough at the moment. I can try and schedule a reboot this weekend.

#9 Updated by Aaron C de Bruyn over 5 years ago

I rebooted the box and was able to get it to hang. I figured I'd try installing updates to see if it still occurred and I was unable to duplicate it. Looks resolved to me.

#10 Updated by Jordan Hubbard over 5 years ago

  • Status changed from Screened to Closed: Cannot reproduce

#11 Avatar?id=14398&size=24x24 Updated by Kris Moore about 3 years ago

  • Target version changed from Unspecified to N/A

Also available in: Atom PDF