Project

General

Profile

Bug #21850

Regular Kernel Panics

Added by Moss Cantwell over 3 years ago. Updated over 3 years ago.

Status:
Closed: Cannot reproduce
Priority:
Important
Assignee:
Alexander Motin
Category:
OS
Target version:
-
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

I've been directed to post this as a bug.
https://forums.freenas.org/index.php?threads/regular-crash-kernel-panic-help.50901/#post-353267
We use freenas as an ISCSI target to perform VM backups and it has been crashing consistiently when we attempt to do a backup.
Each crash has the same pointer addresses and function name (arc_reclaim_thread)
Here is one of the latest crashes:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x20
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80395503
stack pointer = 0x28:0xfffffe0a8d14f890
frame pointer = 0x28:0xfffffe0a8d14f8a0
code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 6 (arc_reclaim_thread)
I have attached the zipped textdumps but they all have the same address.

History

#1 Updated by Moss Cantwell over 3 years ago

  • File debug-nas2-wgn-20170306134224.txz added

#2 Updated by Moss Cantwell over 3 years ago

  • File textdump.tar.0.gz added

#3 Updated by Moss Cantwell over 3 years ago

  • File textdump.tar.1.gz added

#4 Updated by Moss Cantwell over 3 years ago

  • File textdump.tar.2.gz added

#5 Updated by Moss Cantwell over 3 years ago

  • File textdump.tar.3.gz added

#6 Updated by Moss Cantwell over 3 years ago

  • File textdump.tar.4.gz added

#7 Updated by Moss Cantwell over 3 years ago

  • File textdump.tar.last.gz added

#8 Updated by Moss Cantwell over 3 years ago

  • Private changed from Yes to No

#9 Avatar?id=14398&size=24x24 Updated by Kris Moore over 3 years ago

  • Assignee set to Alexander Motin
  • Priority changed from No priority to Important
  • Target version set to 9.10.3

#10 Updated by Moss Cantwell over 3 years ago

  • Assignee deleted (Alexander Motin)

I wanted to add additional details that the other end of the backup pipeline is a windows server 2016 iscsi initiator and VEEAM backup performing the copy job

#11 Updated by Alexander Motin over 3 years ago

  • Status changed from Unscreened to Screened
  • Assignee set to Alexander Motin
  • Seen in changed from Unspecified to 9.10.2-U1

#12 Updated by Alexander Motin over 3 years ago

  • Category changed from 89 to 200

I see nothing in provided dumps that would say the problem is iSCSI related. It seems like internal problem of ZFS ARC management. I would recommend you to try enabling debug kernel to possibly collect any more input, since available information does not give me much.

#13 Updated by Moss Cantwell over 3 years ago

I haven't been able to get much more out of debug mode. I was expecting the system to stay up when it crashed the next time but it rebooted and I never got to look at the debug menu.
The dumps in /data/crash don't seem to be different either.
Is there something I can check to see if the debug mode is working correctly? I have it ticked in the advanced menu in the GUI.

Alexander Motin wrote:

I see nothing in provided dumps that would say the problem is iSCSI related. It seems like internal problem of ZFS ARC management. I would recommend you to try enabling debug kernel to possibly collect any more input, since available information does not give me much.

#14 Updated by Alexander Motin over 3 years ago

Debug kernel should reboot the same way as regular one. The difference of it is that it has many additional checks, that may force system reboot earlier then it must, collecting some additional information. You may check debug kernel running with `uname v` - there should be DEBUG in the kernel name. Are you sure that dumps are exactly the same?

#15 Updated by Moss Cantwell over 3 years ago

My supervisor asked me to install the latest update. I have Applied U2 update and behavior persists.
I can verify that the debug kernel is active,
FreeBSD 10.3-STABLE #0 41eb257(9.10.2-STABLE): Mon Mar 6 17:05:05 UTC 2017
root@gauntlet:/freenas-9.10-releng/_BE/objs/freenas-9.10-releng/_BE/os/sys/FreeN
AS.amd64-DEBUG

However,
I know understand that new dump files are not being produced there are no dumps from this mornings crash.
I haven't had a new dump file since 2017-03-05.
Which means that I have been comparing the exact same files for a week now. (oops)

#16 Updated by Moss Cantwell over 3 years ago

  • File backtrace.txt added

Got something out of a remote syslog server. Graylog has buggered the formatting but here's hoping there is something useful.
I get this warning

warning: KLD '/boot/kernel-debug/dtnfscl.ko' is newer than the linker.hints file

which means the names of source files aren't available?

#17 Updated by Alexander Motin over 3 years ago

The backtrace does show a problem, but it is minor and should not cause your crashes. It is unrelated.

#18 Updated by Moss Cantwell over 3 years ago

  • File dmesg.txt added

Could the whole dmesg output assist?
I would like to know what some of it means too.

#19 Avatar?id=14398&size=24x24 Updated by Kris Moore over 3 years ago

Is there any chance you could try switching to the 9.10 nightlies train? It is now based upon FreeBSD 11, which would be helpful to test against to see if the problem still persists.

#20 Updated by Moss Cantwell over 3 years ago

The issue is mitigated currently by doing backups via Iscsi directly from the vmware esxi host rather than the windows 2012r2 + veeam guest. I'm working on bringing a test system up with 9.10 nightly build

#21 Updated by Alexander Motin over 3 years ago

  • Status changed from Screened to Closed: Cannot reproduce
  • Target version deleted (9.10.3)

Lets close this ticket so far. Nightly train same as upcoming 9.10.3 release replace the OS version, so data submitted so far (not being very helpful even now) will be obsolete for significant part.

#22 Updated by Dru Lavigne almost 3 years ago

  • File deleted (debug-nas2-wgn-20170306134224.txz)

#23 Updated by Dru Lavigne almost 3 years ago

  • File deleted (textdump.tar.0.gz)

#24 Updated by Dru Lavigne almost 3 years ago

  • File deleted (textdump.tar.1.gz)

#25 Updated by Dru Lavigne almost 3 years ago

  • File deleted (textdump.tar.2.gz)

#26 Updated by Dru Lavigne almost 3 years ago

  • File deleted (textdump.tar.3.gz)

#27 Updated by Dru Lavigne almost 3 years ago

  • File deleted (textdump.tar.4.gz)

#28 Updated by Dru Lavigne almost 3 years ago

  • File deleted (textdump.tar.last.gz)

#29 Updated by Dru Lavigne almost 3 years ago

  • File deleted (backtrace.txt)

#30 Updated by Dru Lavigne almost 3 years ago

  • File deleted (dmesg.txt)

Also available in: Atom PDF