Project

General

Profile

Bug #25289

Samba panicking because of shadow copies

Added by John Smith about 3 years ago. Updated about 3 years ago.

Status:
Closed: Cannot reproduce
Priority:
No priority
Assignee:
John Hixson
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

I'm not entirely sure this part is related or even the cause, but I believe it is:
A couple of months ago, /var filled up to 0 free space. Because of that, I could not login to the webgui, because PHP requires writing session data to /var/tmp (this is a bug on its own and I'm reporting it on another ticket). After checking, it turned out /var/db/system/rrd/ was what was willing up /var. As suggested by some forum posts and because I didn't have much option, I manually removed most of that folder (I can't remember if all of it, but most of it for sure). This solved the initial problem, I could log in the webgui and tell it to move the RRD data to the system dataset.

This apparently (or maybe not, I'm not sure) created a new problem which this ticket is about:
Whenever a Windows client (I do not have other clients to test with) tries to right click a file, properties, and check for "Previous Versions" (shadow copies) that are fetched from snapshots, Samba instantly panics and crashes. One symptom is that samba keeps crashing repeatedly and very quickly dozens of times (I estimate about 150-200 times) and then returns to normal functionality. The logs are spammed.
After some checking, I had the conclusion that files that were created in 2016 (well before the RRD catastrophe) - did not cause crashes. Taking the exact same files, changing the last-modified date - did (I'm not sure if new snapshots were taken between changing the last-modified and the check, but I believe not).

The initial thought was that it is trying to look for shadow copies in snapshots that were somehow corrupted or disappeared or whatever, because of the mess with the RRD files early on. I decided to completely wipe out all snapshots of all datasets and let the periodic snapshot mechanism take care of creating a new history from this point onward. This took some time, but eventually accomplished.
At the point all snapshots were removed, this did not crash Samba and everything was lovely again. I believed that fixed the issue, but since we're both here - it didn't.
Now, after a few days worth of snapshots, this issue happens again (I didn't test until now).

I have attached a few lines from /var/log/messages, and attached an example of the crash report from /var/log/samba4/log.smbd (different occurrences but same issue). I deliberately masked ("***") some things that I thought were private or may be private. I can provide a more complete log if the details are confirmed to not be private or uniquely identifying (or if provided privately to somebody).

I wasn't sure if this is a Services bug (because of Samba) or something bigger and more FreeNAS related (because of RRD), so I gambled on Services.

History

#1 Updated by an odos about 3 years ago

Please upload a debug file. System->Advanced->Save Debug

#2 Updated by John Smith about 3 years ago

an odos wrote:

Please upload a debug file. System->Advanced->Save Debug

I have already crawled through all of the data in the debug ball and posted everything I imagined could be relevant. If there's anything specific you think I should be posting in addition to what's already in, please let me now.

That said, with all due respect - and there's a lot!, the debug has far too much personally identifying information than I'm willing to post on a public board.

#3 Updated by Dru Lavigne about 3 years ago

The ticket will be marked private until a dev has a chance to review the debug.

#4 Updated by Dru Lavigne about 3 years ago

  • Status changed from Unscreened to 15

#5 Updated by John Smith about 3 years ago

Dru Lavigne wrote:

The ticket will be marked private until a dev has a chance to review the debug.

I appreciate the concern. I hope this is not understood the wrong way, but the debug tarball includes (far) too much private and even sensitive information that is completely irrelevant and so I cannot provide it as it is. I will not be uploading it as it is, regardless of the "privacy" of this report.

Please let me know what information and/or files do you need specifically, that wasn't already provided, and I'll make sure to provide it.

#6 Updated by Dru Lavigne about 3 years ago

  • Status changed from 15 to Unscreened
  • Assignee changed from Release Council to John Hixson

#7 Updated by an odos about 3 years ago

Nobady Ympurtante wrote:

an odos wrote:

Please upload a debug file. System->Advanced->Save Debug

I have already crawled through all of the data in the debug ball and posted everything I imagined could be relevant. If there's anything specific you think I should be posting in addition to what's already in, please let me now.

That said, with all due respect - and there's a lot!, the debug has far too much personally identifying information than I'm willing to post on a public board.

Fair enough. To start with, attach a redacted version of the following

./ixdiagnose/fndebug/SMB/dump.txt
./ixdiagnose/fndebug/Hardware/dump.txt

#8 Updated by John Hixson about 3 years ago

  • Status changed from Unscreened to Screened
  • Target version set to 11.0-U3

#9 Updated by John Smith about 3 years ago

  • File cifs_dump.txt added
  • File hardware_dump.txt added

#10 Updated by John Smith about 3 years ago

  • Private changed from No to Yes

There is no SMB/dump.txt. I assume you meant CIFS/dump.txt.

#11 Updated by John Smith about 3 years ago

After partially updating from 9.10.2-U5 to 9.10.2-U6 (`freenas-update update` but without reboot), and then restarting Samba, it seems like the problem is gone.

I assume that it is fixed because U6 includes changes to Samba that cleared caches of some sort, that were used for the Previous Versions feature, rather than the update itself (which is merely security update?) fixing it.

I would not close this report as resolved, though, as I believe this would happen all over again given the right circumstances. I also assume that updating Samba (and thus clearing some cache?) is a work around.

#12 Updated by John Smith about 3 years ago

  • Private changed from Yes to No

#13 Updated by John Smith about 3 years ago

  • Seen in changed from 9.10.2-U5 to 9.10.2-U6

The problem has officially returned. Samba is again crashing when a client looks for Previous Versions of a file.

#14 Updated by John Smith about 3 years ago

Here's some more input after further testing.

Going back to 9.10.2-*U2* and importing the Config of U5 seems to be solving the issue, so this is a regression.

I tried 9.10.2-U4 in attempt to narrow the possibilities, but couldn't load the Config of U5 in order to test, because it's too new (how is it too new for U3 but not for U2?).

#15 Updated by John Hixson about 3 years ago

  • Status changed from Screened to 15

Nobady Ympurtante wrote:

Here's some more input after further testing.

Going back to 9.10.2-*U2* and importing the Config of U5 seems to be solving the issue, so this is a regression.

I tried 9.10.2-U4 in attempt to narrow the possibilities, but couldn't load the Config of U5 in order to test, because it's too new (how is it too new for U3 but not for U2?).

Is there any chance you can update to 11? It is highly unlikely there will be another 9.10 release and this problem is probably fixed in 11. If not, I will help debug this and see if there is a solution we can come up with (That does not require another release ;-))

#16 Updated by John Smith about 3 years ago

Is there any chance you can update to 11?

I could, but that's not an option until 11 becomes a viable upgrade. It currently isn't mainly because of the lack of a migration tool for jails and the incredible confusion around it (this is a production server).

If not, I will help debug this and see if there is a solution we can come up with

I would appreciate that.

FWIW there's a lot of confusion, misinformation and lack of clarity regarding the switch between 9.10 and 11, features, migration tools and such, which makes a lot of people stay back and 'wait and see', even though they would otherwise gladly jump on the train.

#17 Updated by John Hixson about 3 years ago

Nobady Ympurtante wrote:

Is there any chance you can update to 11?

I could, but that's not an option until 11 becomes a viable upgrade. It currently isn't mainly because of the lack of a migration tool for jails and the incredible confusion around it (this is a production server).

No migration is necessary, your 9.10 jails will work just as they are on 11. Should something not work, you can just boot back into your 9.10 boot environment.

If not, I will help debug this and see if there is a solution we can come up with

I would appreciate that.

FWIW there's a lot of confusion, misinformation and lack of clarity regarding the switch between 9.10 and 11, features, migration tools and such, which makes a lot of people stay back and 'wait and see', even though they would otherwise gladly jump on the train.

It's a pretty straight forward upgrade. We've had a lot of success with it. I'd be interested to know what confusion and misinformation you are hearing/seeing and where. I'd be happy to clear up anything. I can still help you try to identify the issue here, but I've had at least 2 other tickets with similar issues that got fixed by upgrading to 11. There will be no more releases to 9.10 except possibly security releases.

#18 Updated by John Smith about 3 years ago

No migration is necessary, your 9.10 jails will work just as they are on 11. Should something not work, you can just boot back into your 9.10 boot environment.

In that case, I will find some time during the day to give this a try, and report back here after the act.

I'd be interested to know what confusion and misinformation you are hearing/seeing and where.

From my personal experience, mainly regarding the jails environment. The official documentation is lacking and everywhere else you hear all kinds of opinions and experiences, and it's hard to tell what to trust so you must consider all of them.

The consensus, as I understand it (albeit probably wrong according to what you say) is to stay away from 11 because it's simply not ready yet for anything beyond a play ground, and if you upgrade to 11 the jails part is "the trickiest" part, without much explanation. On the other side, you clarified they should be working just as they are, hence not only it's not the trickiest part, but it's not an issue at all.

The general feeling is that 11 is essentially Corral but with a different name and somewhat more stable, and shouldn't be used for anything production just yet. I believe the lack of decent and clear explanations and clarifications - e.g. the jails part - feed a good part of that.

#19 Updated by an odos about 3 years ago

Nobady Ympurtante wrote:

No migration is necessary, your 9.10 jails will work just as they are on 11. Should something not work, you can just boot back into your 9.10 boot environment.

In that case, I will find some time during the day to give this a try, and report back here after the act.

I'd be interested to know what confusion and misinformation you are hearing/seeing and where.

From my personal experience, mainly regarding the jails environment. The official documentation is lacking and everywhere else you hear all kinds of opinions and experiences, and it's hard to tell what to trust so you must consider all of them.

The consensus, as I understand it (albeit probably wrong according to what you say) is to stay away from 11 because it's simply not ready yet for anything beyond a play ground, and if you upgrade to 11 the jails part is "the trickiest" part, without much explanation. On the other side, you clarified they should be working just as they are, hence not only it's not the trickiest part, but it's not an issue at all.

The general feeling is that 11 is essentially Corral but with a different name and somewhat more stable, and shouldn't be used for anything production just yet. I believe the lack of decent and clear explanations and clarifications - e.g. the jails part - feed a good part of that.

I'm using FN 11 in production on multiple servers in a business environment. It's stable. I had no problems with jails after the upgrade. It is nothing like corral.

I assume the above statements regarding jails are an oblique reference to forum posts like this one: https://forums.freenas.org/index.php?threads/freenas-9-10-to-freenas-11-jails.54275/ I can't really comment on those problems because, as I mentioned, my jails continued to work normally.

#20 Avatar?id=14398&size=24x24 Updated by Kris Moore about 3 years ago

  • Status changed from 15 to Closed: Cannot reproduce
  • Target version changed from 11.0-U3 to N/A

Going to close this out. If somebody can reproduce it on 11.0-U2, please re-open.

#21 Updated by Dru Lavigne almost 3 years ago

  • File deleted (messages.txt)

#22 Updated by Dru Lavigne almost 3 years ago

  • File deleted (log smbd.txt)

#23 Updated by Dru Lavigne almost 3 years ago

  • File deleted (cifs_dump.txt)

#24 Updated by Dru Lavigne almost 3 years ago

  • File deleted (hardware_dump.txt)

Also available in: Atom PDF