Project

General

Profile

Bug #27123

Fix range locking in ZIL commit codepath

Added by Andrej Gortchivkin over 1 year ago. Updated about 1 year ago.

Status:
Done
Priority:
Critical
Assignee:
Alexander Motin
Category:
OS
Target version:
Seen in:
Severity:
Medium
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Hi All,

I know this sounds pretty weird and somewhat ridiculous for first sight but I noticed that Windows iSCSI traffic is crashing my FreeNAS box when the underneath ZVOL is using 64kb zvolblocksize. This strange behavior tends to happen in at least few scenarios now, so I decided to post a new ticket about it. Hopefully there is some sort of a reasonable explanation as well as a fix for it.

First of all I would like to pinpoint that my FreeNAS box is virtualized via VMware ESX 6.5 however the storage part is configured explicitly for pass-through and therefore all ZFS related activities are treated as if they are handled directly by a physical box. For the networking part I'm using VMXNET adapters all configured with jumbo frames support. Generally speaking my FreeNAS server works rock steady and its main use is to serve few ESX hypervisors as well as couple of Windows Clusters all of which are utilizing iSCSI storage implementation. So far so good on the stability except for the part where I started to rearrange my ZVOL's from 16K to 64K based volblocksizes for performance optimization reasons.

The transition took quite a lot of time since I had to create brand new ZVOL's and LUN's then copy over all the data. On the ESX side everything went fine and it seems to work absolutely normal afterwards even under heavy load. So as a final step I decided to go for the same 16k to 64k volblocksize transition on the remaining few Windows LUNS which are assigned to a SQL Failover Cluster (virtualized and hosted on the top of my ESX servers). Once the new LUN's were provisioned and I started to do some data transfer activities I noticed the loss of connectivity to the complete environment which turned out to be caused by a ZFS panic. DMESG indicated the following message:


panic: Solaris(panic): zfs: allocating allocated segment(offset=344925949440 size=512)

cpuid = 2
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe080e7cb5a0
vpanic() at vpanic+0x186/frame 0xfffffe080e7cb620
panic() at panic+0x43/frame 0xfffffe080e7cb680
vcmn_err() at vcmn_err+0xc2/frame 0xfffffe080e7cb7c0
zfs_panic_recover() at zfs_panic_recover+0x5a/frame 0xfffffe080e7cb820
range_tree_add() at range_tree_add+0x20f/frame 0xfffffe080e7cb8e0
metaslab_free_dva() at metaslab_free_dva+0x284/frame 0xfffffe080e7cb940
metaslab_free() at metaslab_free+0x91/frame 0xfffffe080e7cb980
zio_dva_free() at zio_dva_free+0x1a/frame 0xfffffe080e7cb990
zio_execute() at zio_execute+0xac/frame 0xfffffe080e7cb9e0
taskqueue_run_locked() at taskqueue_run_locked+0x127/frame 0xfffffe080e7cba40
taskqueue_thread_loop() at taskqueue_thread_loop+0xc8/frame 0xfffffe080e7cba70
fork_exit() at fork_exit+0x85/frame 0xfffffe080e7cbab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe080e7cbab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic


Unfortunately there was no other indication anywhere about the crash, so I thought that this might have been caused by something completely else and totally unrelated to the data transfer. Therefore I retried the data copy which resulted with another crash and with similar message:


panic: Solaris(panic): zfs: allocating allocated segment(offset=20785089536 size=512)

cpuid = 2
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe080e5e45a0
vpanic() at vpanic+0x186/frame 0xfffffe080e5e4620
panic() at panic+0x43/frame 0xfffffe080e5e4680
vcmn_err() at vcmn_err+0xc2/frame 0xfffffe080e5e47c0
zfs_panic_recover() at zfs_panic_recover+0x5a/frame 0xfffffe080e5e4820
range_tree_add() at range_tree_add+0x20f/frame 0xfffffe080e5e48e0
metaslab_free_dva() at metaslab_free_dva+0x284/frame 0xfffffe080e5e4940
metaslab_free() at metaslab_free+0x91/frame 0xfffffe080e5e4980
zio_dva_free() at zio_dva_free+0x1a/frame 0xfffffe080e5e4990
zio_execute() at zio_execute+0xac/frame 0xfffffe080e5e49e0
taskqueue_run_locked() at taskqueue_run_locked+0x127/frame 0xfffffe080e5e4a40
taskqueue_thread_loop() at taskqueue_thread_loop+0xc8/frame 0xfffffe080e5e4a70
fork_exit() at fork_exit+0x85/frame 0xfffffe080e5e4ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe080e5e4ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic


At this point I started to get very suspicious because I remembered that approximately 2 months back I was doing some ZFS/ZVOL/iSCSI performance tests by using an external physical Windows client and IO Meter where a very similar crash have happened with FreeNAS exactly when I was testing with a 64K volblocksize segment although at the time being there was no indication at all that the crash would have been related to this activity. Right now I'm definitely seeing a pattern here although I'm completely blind on why this would happen only with very specific clients.

I can confirm that this seems to happen only with 64K blocksizes and only with Windows clients. I managed to make a test data transition between 16K based ZVOL's on the SQL cluster without any issues and all my ESX boxes which are now also utilizing 64K ZVOL's are running just fine.

Unfortunately the /data/crash folder in FreeNAS is completely empty, so I'm not even sure where to look for additional clues.

Any help would be appreciated. Thanks in advance.


Related issues

Related to FreeNAS - Bug #27199: Freenas will crash, unable to useClosed: Insufficient Info2017-12-11

Associated revisions

Revision be7c3269 (diff)
Added by avg over 1 year ago

MFC r323918: MFV r323917: 8648 Fix range locking in ZIL commit codepath

This fixes a problem introduced in r315441, MFC of r308782.

Ticket: #27123

(cherry picked from commit 2133ba986c2f3b0b018580385318c837cccf0513)

Revision be7c3269 (diff)
Added by avg over 1 year ago

MFC r323918: MFV r323917: 8648 Fix range locking in ZIL commit codepath

This fixes a problem introduced in r315441, MFC of r308782.

Ticket: #27123

(cherry picked from commit 2133ba986c2f3b0b018580385318c837cccf0513)

History

#1 Updated by Dru Lavigne over 1 year ago

Andrej: please attach a debug (System -> Advanced -> Save Debug).

#2 Updated by Andrej Gortchivkin over 1 year ago

  • File debug-sns01storvm01-20171210123909.tgz added
  • Seen in changed from Unspecified to 11.0-U4

Debug info attached.

#3 Updated by Dru Lavigne over 1 year ago

  • Assignee changed from Release Council to Alexander Motin

#4 Updated by Dru Lavigne over 1 year ago

  • Target version set to 11.0-U1

#5 Updated by Dru Lavigne over 1 year ago

  • Target version changed from 11.0-U1 to 11.1-U1

#6 Updated by Alexander Motin over 1 year ago

  • Status changed from Unscreened to 15
  • Priority changed from No priority to Important

This looks either related or duplicate of #27199. Could try update o 11.1 to check whether the problem persist?

#7 Updated by Alexander Motin over 1 year ago

  • Related to Bug #27199: Freenas will crash, unable to use added

#8 Updated by Andrej Gortchivkin over 1 year ago

Thanks Alexander,

I'll certainly try out the newest version although it would be nice if you can put some details about the related #27199 ticket here since it seems to be a private one or at least I can't access it (error 403) in order to cross check the details against my own situation.

At this moment I assume my 64K issue is a known problem, but if you can confirm this and put a public technical reference that would be appreciated. Unfortunately I have only a single FreeNAS system to work with at this time and I'm running my live VM's on it, so I don't want to engage a crash of the storage just like that or go for another 64K test blindly.

Do you have by chance a public link that outlines the technical details around this issue ? Or perhaps if you just paste some of the details in this ticket that would work for me as well.

Thanks in advance.

#9 Updated by Alexander Motin over 1 year ago

The other ticket reports FreeNAS crash with the same backtrace after using iSCSI on top of ZVOL with 64K block on top of 10-wide RAIDZ. I'd say it is likely the same, just FreeNAS version there is even older -- 9.10.2-U6.

#10 Updated by Andrej Gortchivkin over 1 year ago

Thanks

Just out of curiosity and with the information from the other ticket can we state that the issue is isolated to Windows iSCSI environment or does FreeNAS gets affected by other OS's as well ? In my case I see this only with Windows in-guest iSCSI although I haven't tested any other OS's beside my ESXi instance which are running just fine under 64K zvols.

I'll give it a try on the latest FreeNAS and will report back accordingly, although right now I'm quite skeptical the upgrade will change anything. On my end the crash happened with 11.0-U4 which is not old at all and at the same time I didn't see anything in the release notes for 11.1 related to my problem, unless of course I'm missing something. Can you advise me on any particular troubleshooting steps that I should take in case the fault happens again ?

#11 Updated by Alexander Motin over 1 year ago

The other ticket does not mention the OS, but looking on provided debug info that is likely. I haven't heard about this problem with other OS, but also can not guess what can be wrong in case of Windows.

Between FreeNAS 11.0 and 11.1 -- 6 months of ZFS development. I can't say whether it should help, but people worked hard.

#12 Updated by Andrej Gortchivkin over 1 year ago

So I made some tests during the last few days and surprisingly the issue didn't came back at all. While I can't tell if the problem is fully resolved with FreeNAS 11.1, I can certainly confirm that deploying SQL clusters with 64K based zvols doesn't crash the storage system at this moment. Also I made some tests by copying multiple GB of data between mounted iSCSI volumes (again 64K based zvols), and everything seems to work really stable.

I assume the issue must have been somewhere deep inside the ZFS layer itself, so the only thing that comes to my mind is if ZFS snapshots would have had anything related to those crashes. Anyway for the time being I'm considering this problem as resolved although I may return to this ticket in case the same crashing behavior is experienced once again.

Alexander,

From your last comment I get the understanding that FreeNAS minor updates are not necessarily containing latest ZFS code changes like for example v11.0 U1 and v11.0 U4, while major releases such as v11.1 would get everything included. In fact it would make perfect sense why I didn't experience crashes with the latest FreeNAS version but can you please confirm or deny my assumption ?

Thanks

#13 Updated by Dru Lavigne over 1 year ago

  • Status changed from 15 to Closed: Not Applicable
  • Target version changed from 11.1-U1 to N/A
  • Private changed from Yes to No

Thanks for the update Andrej! Closing out.

#14 Updated by Dru Lavigne over 1 year ago

  • File deleted (debug-sns01storvm01-20171210123909.tgz)

#15 Updated by Alexander Motin over 1 year ago

Andrej Gortchivkin wrote:

From your last comment I get the understanding that FreeNAS minor updates are not necessarily containing latest ZFS code changes like for example v11.0 U1 and v11.0 U4, while major releases such as v11.1 would get everything included.

Yes, that is the idea. Minor "Ux" updates receive only changes we explicitly required to include to address specific problems, while major X.x updated receive everything from upstream FreeBSD OS, all new FreeNAS features, etc.

#16 Updated by Andrej Gortchivkin over 1 year ago

Thanks for the clarification Alexander.
Much appreciated... :)

#17 Updated by Andrej Gortchivkin over 1 year ago

  • Seen in changed from 11.0-U4 to 11.1

I would like to re-open this ticket as the issue is once again experienced despite of the FN upgrade to 11.1

I just received another crash with a very similar panic message while I an application was creating a new SQL database over a 64k zvol.


panic: Solaris(panic): zfs: allocating allocated segment(offset=94783717376 size=8192) of (offset=94783717376 size=8192)

cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe080e86d5a0
vpanic() at vpanic+0x186/frame 0xfffffe080e86d620
panic() at panic+0x43/frame 0xfffffe080e86d680
vcmn_err() at vcmn_err+0xcf/frame 0xfffffe080e86d7b0
zfs_panic_recover() at zfs_panic_recover+0x5a/frame 0xfffffe080e86d810
range_tree_add_impl() at range_tree_add_impl+0x1dd/frame 0xfffffe080e86d8e0
metaslab_free_dva() at metaslab_free_dva+0x314/frame 0xfffffe080e86d940
metaslab_free() at metaslab_free+0x91/frame 0xfffffe080e86d980
zio_dva_free() at zio_dva_free+0x1a/frame 0xfffffe080e86d990
zio_execute() at zio_execute+0xac/frame 0xfffffe080e86d9e0
taskqueue_run_locked() at taskqueue_run_locked+0x147/frame 0xfffffe080e86da40
taskqueue_thread_loop() at taskqueue_thread_loop+0xb8/frame 0xfffffe080e86da70
fork_exit() at fork_exit+0x85/frame 0xfffffe080e86dab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe080e86dab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

#18 Updated by Dru Lavigne over 1 year ago

  • Status changed from Closed: Not Applicable to 15
  • Target version deleted (N/A)

Andrej: please attach a debug (System -> Advanced -> Save Debug).

#19 Updated by Andrej Gortchivkin over 1 year ago

  • File debug-sns01storvm01-20180107150034.tgz added

Debugging info attached...

#20 Updated by Dru Lavigne over 1 year ago

  • Status changed from 15 to Unscreened
  • Private changed from No to Yes

#22 Updated by Alexander Motin over 1 year ago

  • Status changed from Unscreened to Screened

#23 Updated by Dru Lavigne over 1 year ago

  • Target version set to 11.1-U2

#24 Updated by Andrej Gortchivkin over 1 year ago

I just encountered the issue once again but this time with FN 11.1-U1 version.
The crash happened during a SQL database restore operation over a 64k zvol.
This is the only message set from the crash I could extract from the FN server. No zfs panic messages this time.


pid 1976 (syslog-ng), uid 0: exited on signal 6 (core dumped)

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address = 0x20
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff803bbcb3
stack pointer = 0x0:0xfffffe080d533890
frame pointer = 0x0:0xfffffe080d5338d0
code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 17 (arc_reclaim_thread)


#25 Updated by Dru Lavigne over 1 year ago

  • Status changed from Screened to Not Started

#26 Avatar?id=13649&size=24x24 Updated by Ben Gadd over 1 year ago

  • Due date set to 02/12/2018

Due date updated to reflect the code freeze for 11.1U2.

#27 Avatar?id=13649&size=24x24 Updated by Ben Gadd over 1 year ago

  • Severity set to New

#28 Updated by Alexander Motin over 1 year ago

  • Status changed from Not Started to In Progress
  • Priority changed from Important to Critical

Collecting information from different sources it seems like this problem is related to ZIL writing in immediate mode. That is a non-trivial part of the code and I'll need some more time to familiarize with it. So far it was reported that setting sysctl vfs.zfs.vol.immediate_write_sz=131072 workarounds the problem. You may try it if problem appears frequently enough for you to detect the difference.

#29 Updated by Andrej Gortchivkin over 1 year ago

Thanks Alexander,

I'll see how I can apply and test this setting in my current environment. In the meantime would you mind to share some details about the nature / purpose of the vfs.zfs.vol.immediate_write_sz setting ? The "ZIL writing in immediate mode" doesn't ring a bell at the moment. My best guess here would be that this parameter based on its value controls specific sized block writes to be written / flushed via ZIL in a different way. I'm curios about the details because my current vfs.zfs.vol.immediate_write_sz is 32768 and I'm a bit concerned that with the recent transformation of all my ZVOL's to 64kb I may impact something else in case I set the parameter to 131072 value. My ESX boxes are still running just fine with 64kb ZVOL's so I suppose the value change wouldn't impact anything but still I'm just trying to understand the pros and cons of tampering with this particular setting.

#30 Updated by Alexander Motin over 1 year ago

There are two ways synchronous writes can be done: data could be written into the log itself, or they could be written as proper blocks, and only referenced from log. Second case is more efficient for large writes, but also more complicated, and I suppose there is a problem somewhere in it. Proposed tunable effectively disables it, storing all synchronous writes in ZIL itself.

Your VMware is not affected because it does not use synchronous writes, expecting storage to be consistent even without that (which is not always true).

#31 Updated by Alexander Motin over 1 year ago

  • Severity changed from New to Medium
  • Needs Doc changed from Yes to No
  • Needs Merging changed from Yes to No

I've found and merged one recent ZFS commit not merged into 11.1 branch before, which may be relevant. Unfortunately I still can't reproduce the issue to be sure that is it. I'll wait for some feedback from user who had more or less reliable reproduction.

#32 Updated by Andrej Gortchivkin over 1 year ago

Thanks Alexander,

I'll check it out and will try to make some tests in my own environment. Currently I'm building up another FN box with similar parameters to my main FN server in order to test this issue within an isolated space, but it may take some time until I fully reproduce the problem. Hopefully some IO Meter based tests will get me the same results as before and therefore I wouldn't have to crash my main FN server all the time while risking my data. If the IO meter doesn't work, then I'll give it another try through SQL servers which seems to produce the problem quite often. Unfortunately with with FN 11 the reproduction does seem to be quite unpredictable though, so I may even get back to an older FN version as necessary.

I'll let you know once I have some results.

#33 Updated by Alexander Motin over 1 year ago

I doubt testing with 9.10 is reasonable at this point. I've verified that commit probably triggering the issue indeed appeared there, so 9.10 may be affected, but that branch is too old now.

#34 Updated by Alexander Motin over 1 year ago

Other user reported that he already tried that change and it did not help him, so I'll continue my search. But it can be that there was more then one problem, and some cases are fixed now.

#35 Updated by Andrej Gortchivkin over 1 year ago

Yeah it could easily be the case.
Could you please reference here the ticket from the other user ? I'm curious to know the details about his situation and perhaps it could help me with some of the testing phases.

#36 Updated by Alexander Motin over 1 year ago

There is no ticket, since he is not a FreeNAS, but vanilla FreeBSD user, and we just have private email conversation.

#37 Updated by Andrej Gortchivkin over 1 year ago

I see... oh well, nvm. Thank anyway.

#38 Updated by Dru Lavigne over 1 year ago

  • Target version changed from 11.1-U2 to 11.2-BETA1

#39 Avatar?id=13649&size=24x24 Updated by Ben Gadd over 1 year ago

  • Due date changed from 02/12/2018 to 03/09/2018

#40 Updated by Dru Lavigne over 1 year ago

Andrej: as a followup, is this still reproducible for you on 11.1-U2?

#41 Updated by Dru Lavigne over 1 year ago

  • File deleted (debug-sns01storvm01-20180107150034.tgz)

#42 Updated by Dru Lavigne over 1 year ago

  • Subject changed from Windows iSCSI traffic crashing my FreeNAS box when using 64kb volblocksize to Fix range locking in ZIL commit codepath
  • Status changed from In Progress to Done

#43 Updated by Andrej Gortchivkin over 1 year ago

Sorry about the delay on my reply. So far I couldn't reproduce the issue with my second FreeNAS box, so I assume the issue might be resolved already, but on the other hand I was trying to reproduce the problem only with IOMETER so far. I didn't have the chance to test it against real SQL workload due to lack of time. I still tend to make some tests with my primary FN server that I use for my main workloads but that will require additional time before I could confirm anything further.

Right now I'm a bit hesitant to say the issue is fixed, because the last time I thought so it started to appear pretty randomly so I couldn't see a clear pattern. Furthermore the output error logs were not even showing messages related to ZFS.

#44 Updated by Dru Lavigne over 1 year ago

  • Private changed from Yes to No

#45 Updated by Dru Lavigne about 1 year ago

  • Status changed from Done to Ready for Testing

#46 Updated by Alexander Motin about 1 year ago

  • Status changed from Ready for Testing to Passed Testing

There were no clear reproduction scenario for this issue, so as soon as the same code used in OpenZFS and FreeBSD without issues we may assume it works.

#47 Updated by Dru Lavigne about 1 year ago

  • Status changed from Passed Testing to Done

#48 Updated by Eric Turgeon about 1 year ago

  • Status changed from Done to Passed Testing
  • Needs QA changed from Yes to No

I am not able to reproduce on FreeNAS-11.2-MASTER-201806191350 (4d34a765b) so that is probably fixed in 11.2.

#49 Updated by Eric Turgeon about 1 year ago

  • Status changed from Passed Testing to Done

Also available in: Atom PDF