Project

General

Profile

Umbrella #28585

Fix Samba memory leak in vfswrap_getwd()

Added by Timur Bakeyev almost 2 years ago. Updated over 1 year ago.

Status:
Done
Priority:
Important
Assignee:
Andrew Walker
Category:
OS
Target version:
Due date:
Reason for Closing:
Reason for Blocked:
Needs Doc:
No
Needs Automation:
No

Description

There are numerous reports in which smbd processes start to use large amounts of memory, sometimes taking all system memory and crashing the system.

The biggest issue with this bug is that we can't reliably reproduce it in our environment and that there is no exact sequence of operations and/or conditions that trig this memory leak.

The same environment can work without the problems for weeks and then starts to eat memory every few hours.

Screen Shot 2018-05-12 at 12.56.56 PM.png (71.2 KB) Screen Shot 2018-05-12 at 12.56.56 PM.png reporting tab for memory across past several weeks. Chris Heerschap, 05/12/2018 09:59 AM
17403

Related issues

Related to FreeNAS - Bug #27200: Possible SMB Memory LeakClosed2017-12-11
Related to FreeNAS - Bug #28431: Possible Memory Leak in smbdClosed2018-02-11
Related to FreeNAS - Bug #28027: Samba using up all available ramClosed2018-01-28
Related to FreeNAS - Bug #28471: increase memory use without explanationClosed2018-02-13
Related to FreeNAS - Bug #28926: SMB Eating MemoryClosed2018-02-27
Related to FreeNAS - Bug #33702: smb service stoppingClosed

History

#1 Updated by Timur Bakeyev almost 2 years ago

  • Assignee changed from Release Council to Timur Bakeyev

#2 Updated by Timur Bakeyev almost 2 years ago

  • Related to Bug #27200: Possible SMB Memory Leak added

#3 Updated by Timur Bakeyev almost 2 years ago

  • Related to Bug #28431: Possible Memory Leak in smbd added

#4 Updated by Timur Bakeyev almost 2 years ago

  • Related to Bug #28027: Samba using up all available ram added

#5 Updated by Timur Bakeyev almost 2 years ago

  • Related to Bug #28471: increase memory use without explanation added

#6 Updated by Timur Bakeyev almost 2 years ago

  • Description updated (diff)

#7 Updated by Dru Lavigne almost 2 years ago

  • Category set to OS
  • Target version set to 11.2-RC2

#8 Updated by Timur Bakeyev over 1 year ago

Hi, guys!

Andrew Walker seems found at least one cause of uncontrolled memory leakage of smbd process.

Can you try the following configuration changes under Services->SMB:
1) Uncheck Unix Extensions
2) Add auxiliary parameter wide links = yes

#9 Updated by Lance Fogle over 1 year ago

Timur Bakeyev wrote:

Hi, guys!

Andrew Walker seems found at least one cause of uncontrolled memory leakage of smbd process.

Can you try the following configuration changes under Services->SMB:
1) Uncheck Unix Extensions
2) Add auxiliary parameter wide links = yes

Will you please elaborate on the need for enabling wide links? That is a security concern so am just curious why enabling that will alleviate the memory concerns. Any more information you have to share is appreciated.

#10 Updated by Timur Bakeyev over 1 year ago

Lance Fogle wrote:

Timur Bakeyev wrote:

Hi, guys!

Andrew Walker seems found at least one cause of uncontrolled memory leakage of smbd process.

Can you try the following configuration changes under Services->SMB:
1) Uncheck Unix Extensions
2) Add auxiliary parameter wide links = yes

Will you please elaborate on the need for enabling wide links? That is a security concern so am just curious why enabling that will alleviate the memory concerns. Any more information you have to share is appreciated.

Well, you are right that simply enabling wide links may have security implications, but it's only in the conjunction with the enabled unix extensions, and even in that case you need another directive that would enforce such a usage, otherwise wide links get automatically disabled. Hence we recommend also to disable unix extensions, which on most installations don't have much use anyhow - this is SMB1.0 protocol feature and is supported by *nix like clients only.

As for the need for enabling wide links - at the moment it's a bit of voodoo, we see that with this setting enabled memory doesn't leak so intensively or doesn't leak at all. Within the code there are two pretty different code paths to handle requests for both cases and, apparently, the leak happens in the branch that executed when wide links are disabled.

Keep in mind that this is a temporary workaround, we are trying our best to locate the leaking part of the code.

#11 Updated by Timur Bakeyev over 1 year ago

PowerShell Script by Andrew Walker that exposes the problem:

$dir = "\\remote_address\Share\Dir" 

do
{
  get-childitem $dir | 
    % { $f = $_ ; 
        get-childitem -r $_.FullName | 
           measure-object -property length -sum | 
             select @{Name="Name";Expression={$f}},Sum}
} while (1)

#12 Updated by Andrew Walker over 1 year ago

#13 Updated by Andrew Walker over 1 year ago

  • Target version changed from 11.2-RC2 to 11.1-U5

#14 Updated by Dru Lavigne over 1 year ago

  • Subject changed from SMBD process memory usage grows until it takes whole system memory to Fix Samba memory leak in vfswrap_getwd()
  • Status changed from In Progress to Ready for Testing
  • Assignee changed from Timur Bakeyev to Andrew Walker
  • Needs Doc changed from Yes to No
  • Needs Automation changed from No to Yes

#15 Updated by Timur Bakeyev over 1 year ago

#16 Updated by Timur Bakeyev over 1 year ago

  • Priority changed from No priority to Important

#17 Updated by Chris Heerschap over 1 year ago

17403

I believe I am seeing this as well. FreeNAS Mini running 11.1-U4 with 16GB of RAM, disk config is (4) 2TB WD RED in a dual mirror setup. This system is doing very little, no plugins, no jails, no VMs, only three datasets and the only one with any size is still only 150G, so the storage zpool is only at 4% utilization. Only services running are SSH, SMART, and SMBD. SMBD is sharing the one dataset to one or two clients on the network. Only other task of interest is a daily backup to Backblaze B2 via two cloud sync jobs, but the data on this host rarely changes.

I've been occasionally seeing messages in the daily security run output:

> swap_pager_getswapspace(13): failed
> swap_pager_getswapspace(10): failed
... many of these .. 
> swap_pager_getswapspace(8): failed
> swap_pager_getswapspace(5): failed
> swap_pager_getswapspace(5): failed
> pid 2375 (smbd), uid 0, was killed: out of swap space

Considering how very little this host does, and I have three other systems with lesser config that don't run into the same problem - and none of them are running SMBD, I've got to figure this is related. Attaching a screen shot of the memory reporting tab for the past 4 weeks showing several instances of swap space growing.

I have just now turned off unix extensions and added wide links = yes.

If there's anything else I can contribute from this host, please let me now.

#18 Updated by Timur Bakeyev over 1 year ago

Hi, Chris!

The fix for this (particular) memleak is going to be included into 11.1-U5, which is scheduled on the second week of June. Meanwhile, if you faced with this exactly one, the workaround you applied should minimize the risk of farther Samba memleaks.

If you feel adventurous, you may try to run one of the Nightly trains until the new release, but it's always a risk that something is broken on the particular date when you installed it. So, judge yourself, do you want to take such a risk.

#19 Updated by Chris Heerschap over 1 year ago

If the workaround will hold me over until June, that's fine. This system is remote and so I'm more risk averse with this one. If it were one of my local boxes, I'd certainly try that out. Thanks for the feedback!

#20 Updated by Timur Bakeyev over 1 year ago

  • Related to Bug #33702: smb service stopping added

#21 Updated by Bonnie Follweiler over 1 year ago

  • Status changed from Ready for Testing to Passed Testing

After discussing this with Andrew and given that the code has been put into upstream samba so the fix is correct.
Also taking into account the comment, in this ticket, "The biggest issue with this bug is that we can't reliably reproduce it in our environment and that there is no exact sequence of operations and/or conditions that trig this memory leak."
I am calling this one Passed Testing" since, If they find another source of the memory leak, it will be in a new ticket.

#22 Updated by Dru Lavigne over 1 year ago

  • Status changed from Passed Testing to Done
  • Needs Automation changed from Yes to No

Also available in: Atom PDF