Project

General

Profile

Bug #8958

CIFS Service seems to crash when it comes under load

Added by Michael Stavridis over 5 years ago. Updated about 3 years ago.

Status:
Closed: User Config Issue
Priority:
Nice to have
Assignee:
John Hixson
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Every service except for CIFS has worked flawless.

I receive this error when the CIFS service crashes and restarts via console:

Mar 30 14:49:38 mlfs05 smbd44075: [2015/03/30 14:49:38.037052, 0] ../source3/smbd/notify.c:336(change_notify_remove_request)
Mar 30 14:49:38 mlfs05 smbd44075: PANIC: assert failed at ../source3/smbd/notify.c(336): fsp->notify != NULL
Mar 30 14:49:38 mlfs05 smbd44075: [2015/03/30 14:49:38.037130, 0] ../source3/lib/util.c:785(smb_panic_s3)
Mar 30 14:49:38 mlfs05 smbd44075: PANIC (pid 44075): assert failed: fsp->notify != NULL
Mar 30 14:49:38 mlfs05 smbd44075: [2015/03/30 14:49:38.037386, 0] ../source3/lib/util.c:896(log_stack_trace)
Mar 30 14:49:38 mlfs05 smbd44075: BACKTRACE: 6 stack frames:
Mar 30 14:49:38 mlfs05 smbd44075: #0 0x802ca809c <smb_panic_s3+108> at /usr/local/lib/libsmbconf.so.0
Mar 30 14:49:38 mlfs05 smbd44075: #1 0x801469bc5 <smb_panic+37> at /usr/local/lib/libsamba-util.so.0
Mar 30 14:49:38 mlfs05 smbd44075: #2 0x801845d6a <change_notify_add_request+1226> at /usr/local/lib/samba/libsmbd_base.so
Mar 30 14:49:38 mlfs05 smbd44075: #3 0x80184614a <smbd_notify_cancel_by_smbreq+154> at /usr/local/lib/samba/libsmbd_base.so
Mar 30 14:49:38 mlfs05 smbd44075: #4 0x801828b19 <smbd_smb2_request_process_notify+1673> at /usr/local/lib/samba/libsmbd_base.so
Mar 30 14:49:38 mlfs05 smbd44075: #5 0x8042f2862 <talloc_set_name_const+882> at /usr/local/lib/libtalloc.so.2
Mar 30 14:49:38 mlfs05 smbd44075: [2015/03/30 14:49:38.037638, 0] ../source3/lib/util.c:797(smb_panic_s3)
Mar 30 14:49:38 mlfs05 smbd44075: smb_panic(): calling panic action [/usr/local/libexec/samba/samba-backtrace]

Here are my settings for Samba4:

[global]
server min protocol = SMB2
server max protocol = SMB2_10
interfaces = 127.0.0.1 192.168.100.14
bind interfaces only = yes
encrypt passwords = yes
dns proxy = no
strict locking = no
oplocks = yes
deadtime = 15
max log size = 51200
max open files = 5661281
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes
getwd cache = yes
guest account = nobody
map to guest = Bad User
obey pam restrictions = yes
directory name cache size = 0
kernel change notify = no
panic action = /usr/local/libexec/samba/samba-backtrace
nsupdate command = /usr/local/bin/samba-nsupdate -g
server string = FreeNAS Server
ea support = yes
store dos attributes = yes
unix extensions = no
acl allow execute always = true
acl check permissions = true
dos filemode = yes
domain logons = no
idmap config : backend = tdb
idmap config *: range = 90000001-100000000
server role = member server
netbios name = MLFS05
workgroup = *
***
realm = ****.LOCAL
security = ADS
client use spnego = yes
cache directory = /var/tmp/.cache/.samba
local master = no
domain master = no
preferred master = no
winbind cache time = 7200
winbind offline logon = yes
winbind enum users = yes
winbind enum groups = yes
winbind nested groups = yes
winbind use default domain = no
winbind refresh tickets = yes
idmap config *
***: backend = rid
idmap config *****: range = 10000-90000000
allow trusted domains = no
client ldap sasl wrapping = plain
template shell = /bin/sh
template homedir = /home/%D/%U
pid directory = /var/run/samba
smb passwd file = /var/etc/private/smbpasswd
private dir = /var/etc/private
create mask = 0666
directory mask = 0777
client ntlmv2 auth = yes
dos charset = CP437
unix charset = UTF-8
log level = 1
socket options = SO_RCVBUF=131072 SO_SNDBUF=131072 TCP_NODELAY IPTOS_LOWDELAY
min receivefile size = 16384
use sendfile = no
kernel change notify = no
oplocks = no
getwd cache = yes
dead time = 0
max log size = 5000
max mux = 10000
max open files = 30000
hide dot files = no
veto files = /.DS_Store/.AppleDB/.TemporaryItems/.AppleDouble/.bin/.AppleDesktop/Network Trash Folder/.Spotlight/.Trash

Here are sample settings of an individual share:

[I-Drive]
path = /mnt/Storage_A/I-Drive
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
vfs objects = zfsacl aio_pthread streams_depot streams_xattr
hide dot files = yes
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare
vfs objects = shadow_copy2, zfsacl
shadow: format = auto-%Y%m%d.%H%M-3m
shadow: sort = desc
shadow: snapdir = .zfs/snapshot
store dos attributes = no
ea support = no
map archive = no
map hidden = no
map system = no
map readonly = no

This issue occurs randomly but occurs once a day, three times in a row generally. Please let me know if you need more information.

History

#1 Updated by John Hixson over 5 years ago

  • Status changed from Unscreened to 15
  • Priority changed from Critical to Nice to have
  • Target version set to Unspecified

Can you go to system->advanced->"save debug" and attach the output? Also, how about more details of when it is crashing? what type of load? how many users? anything else that may be of importance here. We do not see this problem in general, so more details would be nice. Also, is your system up to date? Have you applied all updates?

#2 Updated by Michael Stavridis over 5 years ago

I will get this debug info for you. Load is about 45%, it runs 10-30% most times. we have about 70+ users. I believe we have the most up to date 9.3 STABLE version.

#3 Updated by Michael Stavridis over 5 years ago

  • File debug-mlfs05-20150330175005.tgz added

Uploaded debug

#4 Updated by John Hixson over 5 years ago

Ok, can you also please look under /var/db/system/cores/ for a core file (or files) from smbd, nmbd and windbindd? Most likely an smbd core file will be there, can you please attach it as well?

#5 Updated by Michael Stavridis over 5 years ago

Of course give me a moment to collect them.

#6 Updated by John Hixson over 5 years ago

While I am collecting data, can you also attach your /usr/local/etc/smb4.conf ? If you can update your system, you should also do that as well. There have been numerous bugfixes since the version you are running.

#7 Updated by Michael Stavridis over 5 years ago

  • File requested_files.7z added

Here are the requested files

#8 Updated by Michael Stavridis over 5 years ago

  • File smb4.conf added

smb4.conf attached

#9 Updated by Michael Stavridis over 5 years ago

Files uploaded.

#10 Updated by John Hixson over 5 years ago

  • Status changed from 15 to Investigation

Okay. I have to build an environment where I can debug this. It will be a bit.

#11 Updated by Michael Stavridis over 5 years ago

Thank you!

#12 Updated by John Hixson over 5 years ago

I'm still working on this. It took me a long time to figure out what/why was removed from the environment and how/where to put it back in.

#13 Updated by Michael Stavridis over 5 years ago

Were you able to replicate this in your test/dev environment?

#14 Updated by Michael Stavridis over 5 years ago

Just received this:

mlfs05.medialabinc.local kernel log messages:

pid 44075 (smbd), uid 0: exited on signal 6 (core dumped) pid 50567
(smbd), uid 0: exited on signal 6 (core dumped) pid 51127 (smbd), uid
0: exited on signal 6 (core dumped)
sonewconn: pcb 0xfffffe009a91d780: Listen queue overflow: 8 already in
queue awaiting acceptance (6 occurrences)
sonewconn: pcb 0xfffffe009a91d780: Listen queue overflow: 8 already in
queue awaiting acceptance (2 occurrences)

-- End of security output --

#15 Updated by John Hixson over 5 years ago

Michael Stavridis wrote:

Were you able to replicate this in your test/dev environment?

No. I'm still working on it, however.

#16 Updated by Michael Stavridis over 5 years ago

Okay Thank you for the quick updates.

#17 Updated by John Hixson over 5 years ago

Michael Stavridis wrote:

Okay Thank you for the quick updates.

I am still working on this. There was some regressions as far as being able to debug these things and I have been slowly but surely finding them and fixing them. I don't know if I will have it all ready by today, but certainly by tomorrow. I'll keep you posted.

#18 Updated by Michael Stavridis over 5 years ago

Would the kern.ipc.somaxconn sysctl setting have any effect on a system server a 70+ users? I notice the default is 128.

#19 Updated by Michael Stavridis over 5 years ago

Any update?

#20 Updated by John Hixson over 5 years ago

Michael Stavridis wrote:

Any update?

Yes. I've almost got things working in a manner to debug this. However, I'm going to need you to update to the nightly train and run it for a few days until/if samba crashes again. You will be able to boot into your previous environment afterwards. I'll post to this ticket when I'm certain that it's ready.

#21 Updated by Michael Stavridis over 5 years ago

Great to hear, I will schedule an upgrade for this unit. I need to post to my change log what the reason for update / patch is doing for the unit. What was the reason samba was crashing, and how does the patch / upgrade fix this.

#22 Updated by John Hixson over 5 years ago

Michael Stavridis wrote:

Great to hear, I will schedule an upgrade for this unit. I need to post to my change log what the reason for update / patch is doing for the unit. What was the reason samba was crashing, and how does the patch / upgrade fix this.

This isn't a fix. It's a "patch" that will give me the ability to debug this issue further. The core file you gave me is pretty meaningless, unfortunately. With the updates I'm working on, when samba crashes again, it will generate a core file that I can work with.

#23 Updated by John Hixson over 5 years ago

I'm still trying to get it sorted out. The nightlies are built at 11pm PDT. I should have things working by then. You'll need to update after that.

#24 Updated by Michael Stavridis over 5 years ago

Okay sounds good, I will let you know once I have the unit upgraded, if it crashes and post the debug.

#25 Updated by John Hixson over 5 years ago

Okay. It took me a while, but everything is in our master branch to debug this now.

#26 Updated by Michael Stavridis over 5 years ago

Which nightly should I install? The most recent?

#27 Updated by John Hixson over 5 years ago

Yes. You don't have to download an image though. You can just switch to the nightlies under system->update.

#28 Updated by Michael Stavridis over 5 years ago

Okay, thank you. This will be done tomorrow night.

#29 Updated by John Hixson over 5 years ago

  • Status changed from Investigation to 15

Hi Michael,

Any news on this?

#30 Updated by David Velazquez over 5 years ago

Hi,

sorry to Interrupt, but I was having issues like this one last week (using the latest 9.3-RELEASE)

pid 40516 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40532 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40548 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40564 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40580 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40596 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40612 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40628 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40644 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40660 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40676 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40692 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40708 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40727 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40743 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40759 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40775 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40791 (smbd), uid 0: exited on signal 6 (core dumped)
pid 40807 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41526 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41542 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41558 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41574 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41590 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41606 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41622 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41638 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41654 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41670 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41686 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41702 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41718 (smbd), uid 0: exited on signal 6 (core dumped)
pid 41734 (smbd), uid 0: exited on signal 6 (core dumped)

Unfortunately I don't have the Crash logs anymore. I found some severe issues (Windows Explorer froze) while browsing a CIFS mounted Folder with long Name and "accents" (áéíóúaüö... etc...), after renaming the Folder, the "hang issues" where solved. When I saw the crashes in the log at next day, I thought the Problem could be related to the Folder Name issue...

Sorry for giving not that many Details (logs are no longer available), but maybe that could be a hint...

#31 Updated by John Hixson over 5 years ago

Michael,

I'm still wondering if you've been able to update and have any new crash dumps?

#32 Updated by John Hixson over 5 years ago

Any news here?

#33 Updated by John Hixson over 5 years ago

Any updates?

#34 Updated by Michael Stavridis over 5 years ago

No Crashing, but it has not spiked over 40% usage.

#35 Updated by John Hixson over 5 years ago

  • Status changed from 15 to Investigation

#36 Updated by John Hixson over 5 years ago

  • Status changed from Investigation to 15

Michael,

Hows it looking? Has it crashed again?

#37 Updated by Michael Stavridis over 5 years ago

  • File debug-mlfs05-20150424114605.tgz added

the Samba service crashed when coming underload, then the webgui was barely functioning during samba crash, had to reconnect the system to AD to get shares back online. Here is our debug.

#38 Updated by Michael Stavridis over 5 years ago

To note: crash occured after enable dedup. Currently have disabled all dedup features.

#39 Updated by John Hixson over 5 years ago

  • Status changed from 15 to Investigation

Ah. Dedup. This is the root and the cause of all evil that ever existed. Please let me know what continues to happen. I do suspect with no dedup that things will all of a sudden work for you though ;-)

#40 Updated by John Hixson over 5 years ago

  • Status changed from Investigation to 15

Michael,

Have you had any more issues since turning dedup off?

#41 Updated by Michael Stavridis over 5 years ago

Everything is good, will this dedup issue be sorted out when FreeNAS 10 is released? Dedup is a powerful tool, I would like to implement it. The system this runs on has more than enough ram to implement it (192GB of Ram).

#42 Updated by John Hixson over 5 years ago

Michael Stavridis wrote:

Everything is good, will this dedup issue be sorted out when FreeNAS 10 is released? Dedup is a powerful tool, I would like to implement it. The system this runs on has more than enough ram to implement it (192GB of Ram).

How much storage are/were you deduplicating? Have you looked at our hardware recommendations?

http://olddoc.freenas.org/index.php/Hardware_Recommendations

#43 Updated by John Hixson over 5 years ago

I don't know of any problems with dedup other than the insane amount of ram necessary for it to work correctly. I'm assuming you don't meet the requirements here, hence the issues when you have it enabled.

#44 Updated by John Hixson over 5 years ago

  • Status changed from 15 to Closed: User Config Issue

#45 Avatar?id=14398&size=24x24 Updated by Kris Moore about 3 years ago

  • Target version changed from Unspecified to N/A

#46 Updated by Dru Lavigne almost 3 years ago

  • File deleted (debug-mlfs05-20150330175005.tgz)

#47 Updated by Dru Lavigne almost 3 years ago

  • File deleted (requested_files.7z)

#48 Updated by Dru Lavigne almost 3 years ago

  • File deleted (debug-mlfs05-20150424114605.tgz)

#49 Updated by Dru Lavigne over 2 years ago

  • File deleted (smb4.conf)

Also available in: Atom PDF