Project

General

Profile

Bug #24771

SMB Service occasionally stopping in 11.0 RELEASE

Added by David Ohlin about 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
No priority
Assignee:
Timur Bakeyev
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

I was just transferring some files tonight when all of a sudden Windows told me a specific file could no longer be found on the Freenas share. Thinking that was odd, I opened up the freenas web console and the SMB service was stopped - has just been running fine 2 minutes prior.

This is the 2nd time this has happened in the past 2-3 days. The first time I wasn't even able to reach the Freenas web console as it was entirely locked up...had to force reboot my server.

History

#1 Updated by David Ohlin about 3 years ago

  • File debug-freenas-20170621222807.txz added

#2 Updated by David Ohlin about 3 years ago

  • Seen in changed from Unspecified to 11.0

I'm happy to help where I can here, but I may need advice on how to collect any further logs or requests as while I know enough to get by in unix-type things I'm far, far from an expert :) Thanks!

#3 Updated by David Ohlin about 3 years ago

So, on further research I've found two threads here: https://forums.freenas.org/index.php?threads/corral-crashing-when-copying-file-across-datasets.52803/ and here: https://forums.freenas.org/index.php?threads/system-crash-on-smb-file-copy.54971/ that seem to note very similar issues - I hadn't put two and two together that I was indeed copying from one dataset to another, not from my PC to a dataset. I had also assumed that the crash I got a couple days ago was the exact same thing as the SMB share was completely inaccessible...however now I'm wondering if I'm seeing something similar to the above.

Finally, I had forgotten that I currently have a scrub running currently as well...so what I'm going to do is wait until that completes and wait a couple days to see if it happens again...knock on wood it seems like right around every ~couple of days I've been having some sort of issue since upgrading to 11, but I'm hoping I can get them all worked out! I do love FreeNAS! I'll keep this thread updated with what I find, especially if in 4-5 days I haven't had any further issues. Thanks!

#4 Updated by Timur Bakeyev about 3 years ago

  • Status changed from Unscreened to Screened

#5 Updated by Timur Bakeyev about 3 years ago

I see, that Samba crashes for you, which is the reason it's not running at some point. But, based on the log file:

[2017/06/21 21:44:21.835393,  0] ../source3/smbd/service.c:102(set_current_service)
  chdir (/mnt/Volume1/HDD_Backups) failed, reason: No such file or directory
[2017/06/21 21:44:21.835635,  0] ../source3/smbd/service.c:102(set_current_service)
  chdir (/mnt/Volume1/HDD_Backups) failed, reason: No such file or directory
[2017/06/21 21:44:21.836658,  0] ../source3/smbd/service.c:102(set_current_service)
  chdir (/mnt/Volume1/HDD_Backups) failed, reason: No such file or directory
[2017/06/21 21:44:21.836934,  0] ../source3/smbd/service.c:102(set_current_service)
  chdir (/mnt/Volume1/HDD_Backups) failed, reason: No such file or directory
[2017/06/21 21:44:22.795653,  0] ../source3/smbd/service.c:102(set_current_service)
  chdir (/mnt/Volume1/HDD_Backups) failed, reason: No such file or directory
[2017/06/21 22:00:00.516439,  0] ../source3/smbd/service.c:102(set_current_service)
  chdir (/mnt/Volume1/HDD_Backups) failed, reason: No such file or directory
[2017/06/21 22:00:00.524216,  0] ../source3/smbd/smbXsrv_tcon.c:978(smbXsrv_tcon_disconnect)
  smbXsrv_tcon_disconnect(0x93dfc763, 'HDD Backups'): set_current_service() failed: NT_STATUS_INTERNAL_ERROR
[2017/06/21 22:00:00.526523,  0] ../source3/smbd/smbXsrv_tcon.c:1027(smbXsrv_tcon_disconnect_all)
  smbXsrv_tcon_disconnect_all: count[1] errors[1] first[NT_STATUS_INTERNAL_ERROR]
[2017/06/21 22:00:00.530360,  0] ../source3/smbd/smbXsrv_session.c:1725(smbXsrv_session_logoff)
  smbXsrv_session_logoff(0x3d596480): smb2srv_tcon_disconnect_all() failed: NT_STATUS_INTERNAL_ERROR
[2017/06/21 22:00:00.533075,  0] ../source3/smbd/smbXsrv_session.c:1775(smbXsrv_session_logoff_all)
  smbXsrv_session_logoff_all: count[1] errors[1] first[NT_STATUS_INTERNAL_ERROR]
[2017/06/21 22:00:00.533106,  0] ../source3/smbd/server_exit.c:159(exit_server_common)
  Server exit (normal exit)
[2017/06/21 22:00:00.533136,  0] ../source3/smbd/server_exit.c:162(exit_server_common)
  exit_server_common: smbXsrv_session_logoff_all() failed (NT_STATUS_INTERNAL_ERROR) - triggering cleanup
[2017/06/21 22:00:00.533909,  0] ../source3/lib/util.c:791(smb_panic_s3)
  PANIC (pid 97557): smbXsrv_session_logoff_all failed
[2017/06/21 22:00:00.559514,  0] ../source3/lib/util.c:902(log_stack_trace)
  BACKTRACE: 25 stack frames:

So, I'd assume that the primary reason for the crash is sudden disappearance of the share, which, possibly is the result of the controller+pool issue. Unfortunately, I don't see direct failures of the controller at that exact time, so it's just a guess...

Still, messages is full of:

Jun 20 11:38:14 freenas (da11:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 50 44 5b 00 00 49 00
Jun 20 11:38:14 freenas (da11:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Jun 20 11:38:14 freenas (da11:umass-sim0:0:0:0): Retrying command
Jun 19 07:24:23 freenas GEOM: da12: the primary GPT table is corrupt or invalid.
Jun 19 07:24:23 freenas GEOM: da12: using the secondary instead -- recovery strongly advised.
Jun 19 07:24:24 freenas notifier: 32+0 records in
Jun 19 07:24:24 freenas notifier: 32+0 records out
Jun 19 07:24:24 freenas notifier: 33554432 bytes transferred in 1.625209 secs (20646226 bytes/sec)
Jun 19 07:24:24 freenas notifier: dd: /dev/da12: short write on character device
Jun 19 07:24:24 freenas notifier: dd: /dev/da12: end of device
Jun 19 07:24:24 freenas notifier: 33+0 records in
Jun 19 07:24:24 freenas notifier: 32+1 records out
Jun 19 07:24:24 freenas notifier: 34037760 bytes transferred in 0.096261 secs (353599219 bytes/sec)
Jun 19 07:24:28 freenas ZFS: vdev state changed, pool_guid=10245566888999577235 vdev_guid=9350066323026245090
Jun 19 07:24:28 freenas notifier: cannot add to 'Volume1': one or more vdevs refer to the same device
Jun 19 07:24:33 freenas GEOM_ELI: Device da0p1.eli created.
Jun 19 07:52:23 freenas collectd[8859]: utils_vl_lookup: The user object callback failed with status 2.
Jun 19 09:28:19 freenas mps0: Reinitializing controller,
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
....

Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:19 freenas mps0: Firmware: 20.00.07.00, Driver: 21.01.00.00-fbsd
Jun 19 09:28:19 freenas mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Jun 19 09:28:19 freenas mps0: mps_reinit finished sc 0xfffffe00011ed000 post 4 free 3
Jun 19 09:28:19 freenas mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47
Jun 19 09:28:19 freenas mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x804b
Jun 19 09:28:19 freenas mps0: Sleeping 3 seconds after SATA ID error to wait for spinup
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = 96833924acb3988d
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = 8073802690b8dc93
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = 6d687b2690b8dc95
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = 96833826a8b0988d
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = dd68752490b8dbb8
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = 6f885b2690b8dcaa
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = d6653730c69f937f
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = 8469812690b8dc97
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = dd646c2490b8dca8
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = 403b572f918f3843
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = 727c5b2490b8dbb8
Jun 19 09:28:19 freenas mps0: SAS Address for SATA device = 7389832490b8dcaa
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = 96833924acb3988d
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = 8073802690b8dc93
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = 6d687b2690b8dc95
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = 96833826a8b0988d
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = dd68752490b8dbb8
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = 6f885b2690b8dcaa
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = d6653730c69f937f
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = 8469812690b8dc97
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = dd646c2490b8dca8
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = 403b572f918f3843
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = 727c5b2490b8dbb8
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = 7389832490b8dcaa
Jun 19 09:28:19 freenas mps0: SAS Address from SATA device = 7389832490b8dcaa
Jun 19 09:28:19 freenas ctl_datamove: tag 0x29759fc on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x29759fd on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x29759fe on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x29759ff on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a01 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a02 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a03 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a04 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a05 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a07 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a06 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a08 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a09 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a0a on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a0b on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a0c on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a0d on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a0e on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a10 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a18 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a21 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a2a on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a2e on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a2f on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a31 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a35 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a37 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x29757b9 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x297594b on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a33 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x2975a00 on (0:3:0) aborted
Jun 19 09:28:19 freenas ctl_datamove: tag 0x29759f0 on (0:3:0) aborted
Jun 19 09:28:42 freenas mps0: Reinitializing controller,
....
Jun 19 09:28:48 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:48 freenas mps0: Warning: io_cmds_active is out of sync - resynching to 0
Jun 19 09:28:49 freenas mps0: Firmware: 20.00.07.00, Driver: 21.01.00.00-fbsd
Jun 19 09:28:49 freenas mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Jun 19 09:28:49 freenas mps0: mps_reinit finished sc 0xfffffe00011ed000 post 4 free 3
Jun 19 09:28:49 freenas mps0: Unfreezing SIM queue
Jun 19 09:28:50 freenas mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47
Jun 19 09:28:50 freenas mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x804b
Jun 19 09:28:50 freenas mps0: Sleeping 3 seconds after SATA ID error to wait for spinup
Jun 19 09:28:53 freenas mps0: SAS Address for SATA device = 96833924acb3988d
Jun 19 09:28:53 freenas mps0: SAS Address for SATA device = 8073802690b8dc93
Jun 19 09:28:53 freenas mps0: SAS Address for SATA device = 6d687b2690b8dc95

Something happens to controller and it freezes from time to time. I guess, ATM we need to collect more logs to see, where it fails..

#6 Updated by David Ohlin about 3 years ago

Yes - I've seen several of those spam messages (WAY more then I got in 9.10, but I was also using different hardware then too so that could be some of the difference I know). The 'da11 cam ccb status' pieces or anything else relating to 'da11' can actually be ignored as I've since just removed that drive - it was my 16GB Sandisk Cruzer Micro USB 2.0 boot drive and given that I'm using slightly older hardware I figured there is just something up with the USB controller in my server and its compatibility with FreeNAS. So I just moved my boot drive over to a pair of RAID'ed SAS drives. They're behind a hardware RAID controller which I know isn't ideal, but since it's just the boot drive and since I have regular config backups, I'd rather that then get spammed with incessant issue logs.

As for my pool drives they are all in a Lenovo SA120 behind an LSI HBA so there should be no issue there I would think. Giving it a couple of days and crossing my fingers that for some reason certain hardware and FreeNAS just doesn't like lots of big files copied from one dataset to another at certain times ¯\_(ツ)_/¯ Will keep you posted, just let me know if you need me to collect any further logs or anything and instructions on how to do it if so. Thanks!

EDIT: I also didn't realize this server had broadcom gig ethernet ports until just now (5709's it appears...). I'm going to go ahead and order a quad port gig Intel ethernet PCI-E card just because I know that hardware is one less thing I will have to worry about potentially introducing odd issues. Thanks!

#7 Updated by David Ohlin about 3 years ago

So I think we're probably ok to close this at this point...while I haven't tried copying any huge files from one dataset to another since this crash occurred, the SMB service hasn't stopped so far. I can't be sure whether there's any issue there or it's something with my hardware and FreeNAS compatibility, but it's extremely rare for me to need to copy from one dataset directly to another.

I have also restarted my FreeNAS box a couple times as well due to troubleshooting other log messages I'm receiving, but I think it's behaving fairly stable at this point (knock on wood!). The only message I'm consistently getting spammed with through the console (displayed on screen if I attached a monitor to my FreeNAS box) is already mentioned in depth here and looks like a fix is hopefully on the way in 11.0-U1 https://bugs.freenas.org/issues/23857. Thanks for your help!

#8 Updated by Timur Bakeyev about 3 years ago

  • Status changed from Screened to Closed

Based on your request I'm closing thins ticket. If you'd have more information - feel free to re-open it.

Still on my opinion the problem is in the hardware. Recently I had similar issue when NFS directory disappeared during the file transfer. That was cause by failing HDDs in RAID6 and disappearance of the file system.

You may find information in the following post applicable to your situation. As I understand your pool is connected to the HBA via the SAS expander, so if one of the disks resets there the whole pool will be reset and OS may loose the pool and dataset for some period of time. Long lasting operations just expose the problem, as in other cases resets may happen unnoticed. And additional load when copying happens from dataset to dataset may stress whole setup and expose the problem.

https://forums.freebsd.org/threads/58989/#post-337970

#9 Updated by Dru Lavigne about 3 years ago

  • File deleted (debug-freenas-20170621222807.txz)

#10 Updated by Dru Lavigne about 3 years ago

  • Target version set to N/A
  • Private changed from Yes to No

Also available in: Atom PDF