Project

General

Profile

Bug #43481

SMB 0x8007003B error

Added by Martin Richardt about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
No priority
Assignee:
Alexander Motin
Category:
OS
Target version:
Severity:
Low
Reason for Closing:
User Configuration Error
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Freenas gets stuck if copying large files and resets the connection after approximately 4GB. The LAN connection isn't the problem. I have transferred a 11GB file via wget in the freenas-shell without problems. The client used is a windows 10 pc. I tried it with smbv1 and smbv2. The outcome was always the bug.

freenas_swapinfo.JPG (19.9 KB) freenas_swapinfo.JPG Martin Richardt, 08/30/2018 02:33 AM
win_error.JPG (30.9 KB) win_error.JPG Martin Richardt, 08/30/2018 10:02 AM
progress.JPG (30.9 KB) progress.JPG Martin Richardt, 08/30/2018 10:02 AM
gstat_tb4_02.JPG (31.2 KB) gstat_tb4_02.JPG Martin Richardt, 08/30/2018 10:02 AM
gstat_tb4_01.JPG (31.8 KB) gstat_tb4_01.JPG Martin Richardt, 08/30/2018 10:02 AM
win_error.JPG (36.6 KB) win_error.JPG Martin Richardt, 08/30/2018 10:44 AM
wget_perf.JPG (89.3 KB) wget_perf.JPG Martin Richardt, 08/30/2018 10:45 AM
27223
27454
27465
27476
27487
27520
27531

Related issues

Copied to FreeNAS - Bug #43580: Wait longer for middlewared to startDone

History

#1 Updated by Martin Richardt about 2 years ago

  • File debug-freenas-20180829151608.txz added
  • Private changed from No to Yes

#2 Updated by Dru Lavigne about 2 years ago

  • Category changed from Services to OS
  • Assignee changed from Release Council to Alexander Motin
  • Seen in changed from Unspecified to Master - FreeNAS Nightlies

#4 Updated by Alexander Motin about 2 years ago

  • Assignee changed from Alexander Motin to William Grzybowski

Martin, could you specify what do you mean with "Freenas gets stuck"? Does it respond on ping? Can you still login via WebUI? Does it respond via other protocols, or the problem affects only SMB?

What looks suspicious to me and I guess may cause those troubles is lack of mounted swap on your system, while all disks have respective swap partitions. William, could you try to guess what happened to swap there? The only oddity I see there is three ZFS pools without redundancy. It is bad, but should not cause such results.

#5 Updated by Martin Richardt about 2 years ago

Alexander Motin wrote:

Martin, could you specify what do you mean with "Freenas gets stuck"? Does it respond on ping? Can you still login via WebUI? Does it respond via other protocols, or the problem affects only SMB?

The copy process freezes and after a few minutes, the connection gets reset. The Webui seems to run fine the whole time. So pinging shouldn't be a problem.

#6 Updated by Alexander Motin about 2 years ago

If WebUI is alive, then my guess about swap may be wrong. Could you try to get debug while system is in that state, not few minutes after reboot?

#7 Updated by Martin Richardt about 2 years ago

  • File debug.tgz added

here you are. I restarted freenas and tried to copy the file 2-times. The debug-file is from the second try, which brokes after ~8GB!

#8 Updated by William Grzybowski about 2 years ago

  • Copied to Bug #43580: Wait longer for middlewared to start added

#9 Updated by William Grzybowski about 2 years ago

  • Assignee changed from William Grzybowski to Alexander Motin

Swap not configuring is because system is proceeding boot before middlewared process starts properly (likely to slow boot media).

It will be resolved in related ticket, passing this one back to Alexander as discussed in chat.

#10 Updated by Alexander Motin about 2 years ago

  • Status changed from Unscreened to Blocked
  • Reason for Blocked set to Waiting for feedback

Martin, after discussing this case with colleague we got to idea that problem with absent swap (and may be some others) may be caused by extremely slow USB stick used for FreeNAS booting. Considering it does not even report its product name, I guess it is some cheapest possible product of China. We'd recommend you to try replacing it with something better or at least different.

Meanwhile you may try to increase middlewared start timeout by manually applying such a change:

--- /conf/base/etc/local/rc.d/middlewared.orig    2018-08-29 13:33:03.588624100 -0700
+++ /conf/base/etc/local/rc.d/middlewared    2018-08-29 13:33:20.651268720 -0700
@@ -30,7 +30,7 @@
     else
         env PATH=$PATH:/usr/local/sbin:/usr/local/bin LC_ALL=en_US.UTF-8 ${command} -f -P ${pidfile} -r /usr/local/bin/middlewared ${overlay_dirs_arg} --log-handler=file
     fi
-    LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/midclt -t 120 waitready
+    LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/midclt -t 200 waitready
 }

 middlewared_stop() {

and rebooting. Hopefully after that you get swap devices (check `swapinfo` command output), and if positive, retry your tests.

#11 Updated by Martin Richardt about 2 years ago

27223

I tested the proposal without any success. It's the same as before. Then I saved the freenas-config and used a Lexar USB3.0 stick for a new installation. After uploading the config in the webui and restarting the freenas, I tested again without success. I added the debug-file and a screen-dump of the swapinfo result.

#12 Updated by Alexander Motin about 2 years ago

OK, the swap is back, but you say it didn't help.

Looking now into the new debug I see one of smbd processes trying to write something to file got blocked waiting for ZFS transaction commit, while the commit thread is waiting for some I/O completion. It may be nothing, just a snapshot of normal operation, but I'd recommend you to look on what is going on with the disk(s) when it happen, whether there are any stuck or too much delayed requests (look on queue depth and execution times in `gstat -I 1s -po` output). If request(s) would stuck forever you would see some kind of error messages, but it would not happen if there is just enormous delays. Since you have three disks there, it would be interesting to check correlation between problem and disk used.

Actually, I haven't seen you mentioned it, does it happen during write or read, or both?

#13 Updated by Martin Richardt about 2 years ago

27454
27465
27476
27487

Only writes make problems. I retested the wget-method, i.e. open a shell and load a 11GB - file from a URL via wget. The download didn't stop but shows similar properties as the smb-copy-job. It slows down after some gb's but didn't get interrupted. The disk is almost ever at 100% in use, but the throughput is tiny! The problem is not disk-dependent, it occurs on all disks.

#14 Updated by Martin Richardt about 2 years ago

27520
27531

some missing screen pictures

#15 Updated by Martin Richardt about 2 years ago

Took another approach and installed freenas-11.2.beta2 on a fresh us3-stick. After auto-importing the pools and define my user I did the test again. The test was negative as before. The SMB-Drive got disconnected on the windows side, maybe because the freenas stopped responding. On the freenas side the drive was busy at 100% for some time after the disconnect. During this time (100% usage of hdd) I couldn't reconnect the SMB-share. As the usage of the drive was at 0% again the reconnect works fine. The seems to be a serious problem with writing to the disk at a very deep level. The system doesn't use swap because it has 24 GB of main memory. The load is at max. 25% and gstat reported about 400 i/o per second, which I think is normal. My System is a Dell T20 with quadcore Xeon 1225v3 cpu and 24GB of ram. The system had worked before! So it's very unlikely a problem with the hardware, but I will do some tests. Do you need further informations?

#16 Updated by Dru Lavigne about 2 years ago

  • Status changed from Blocked to Unscreened
  • Reason for Blocked deleted (Waiting for feedback)

#17 Updated by Alexander Motin about 2 years ago

  • Status changed from Unscreened to Closed
  • Target version changed from Backlog to N/A
  • Severity changed from New to Low
  • Reason for Closing set to User Configuration Error
  • Needs QA changed from Yes to No
  • Needs Doc changed from Yes to No
  • Needs Merging changed from Yes to No

Now I think that the problem may be caused by pool fragmentation, caused by very high space utilization. At least one of pools TB3 is at 91% of space usage and at 52% of fragmentation, that can easily cause very low disk performance. As I see pool TB4 you are using in present tests has some lower fragmentation, the gstat output shows that disk does its expected hundreds of IOPS, and more throughput it can give only if pool is not fragmented and all I/Os are sequential. So I'd say you need to be more careful to never cross pool space utilization of 80%, or if performance is important -- even over 50%. I'd recommend to add more disks to your system, create single pool of multiple disks instead of multiple pools, and move the data to purge the existing fragmentation.

#18 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug-freenas-20180829151608.txz)

#19 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug.tgz)

#20 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug (1).tgz)

#21 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug (3).tgz)

#22 Updated by Dru Lavigne about 2 years ago

  • Private changed from Yes to No

Also available in: Atom PDF