Box crashes ~2 times per month
Looking in /data/crash/info.1, I can see what appears to be the most recent crash.
Dump header from device /dev/dumpdev Architecture: amd64 Architecture Version: 2 Dump Length: 935202816B (891 MB) Blocksize: 512 Dumptime: Sun May 10 18:58:41 2015 Hostname: Magic: FreeBSD Kernel Dump Version String: FreeBSD 9.3-RELEASE-p13 #4 r281084+c7bb047: Wed Apr 15 15:03:01 PDT 2015 firstname.lastname@example.org:/tank/home/jkh/build/FN/objs/os-base/amd64/tank/home/jkh/build/FN/FreeBSD/src/sys/FREENASSolaris(panic): zfs: allocating allocated segment(offset=0 size=1) Panic String: Solaris(panic): zfs: allocating allocated segment(offset=0 size=1) Dump Parity: 299023393 Bounds: 1 Dump Status: good
Any other information I can gather that would help troubleshoot the issue?
#6 Updated by Josh Paetzel over 5 years ago
- Status changed from Unscreened to 15
A save debug would be useful too. The dmesg buffer is full of lost iSCSI connections that make it look like the network is unhappy.
Also the textdump you attached has an iscsi panic, however the info file you attached earlier has a ZFS crash. Is it possible there's two cxrashdumps in /data/crash?
#7 Updated by Aaron C de Bruyn over 5 years ago
Dump header from device /dev/dumpdev Architecture: amd64 Architecture Version: 1 Dump Length: 121344B (0 MB) Blocksize: 512 Dumptime: Sun May 10 18:55:47 2015 Hostname: nas1.crfr.local Magic: FreeBSD Text Dump Version String: FreeBSD 9.3-RELEASE-p13 #4 r281084+c7bb047: Wed Apr 15 15:03:01 PDT 2015 email@example.com:/tank/home/jkh/build/FN/objs/os-base/amd64/tank/home/jkh/build/FN/FreeBSD/src/sys/FREENAS Panic String: Dump Parity: 2485925632 Bounds: 0 Dump Status: good
The initiator is Proxmox which uses open-iscsi. Unfortunately when you remove an iSCSI connection from the Proxmox interface, it hangs around until you actually reboot the box. We removed two machines from service a few days ago (along with their FreeNAS iSCSI connections), and Proxmox kept trying to connect. After the crash, we rebooted all the Proxmox boxes and now we aren't getting the errors.
I'm not familiar with gathering crash data from FreeBSD, so I'm not sure if I can get you the additional information after the box has been rebooted.
I can attach a tar of /data/crash if that would help.
#12 Updated by Alexander Motin over 5 years ago
- Status changed from Screened to 19
- Priority changed from Important to Nice to have
- ChangeLog Entry updated (diff)
I've committed to nightly branch patch that should close potential race conditions around iSCSI connection start/close. I don't know whether it is the problem, but I don't see anything else that could cause that panic inside icl_send_thread().