Project

General

Profile

Bug #9721

Box crashes ~2 times per month

Added by Aaron C de Bruyn over 5 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Important
Assignee:
Alexander Motin
Category:
-
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Looking in /data/crash/info.1, I can see what appears to be the most recent crash.

Dump header from device /dev/dumpdev
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 935202816B (891 MB)
  Blocksize: 512
  Dumptime: Sun May 10 18:58:41 2015
  Hostname: 
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 9.3-RELEASE-p13 #4 r281084+c7bb047: Wed Apr 15 15:03:01 PDT 2015
    root@build3.ixsystems.com:/tank/home/jkh/build/FN/objs/os-base/amd64/tank/home/jkh/build/FN/FreeBSD/src/sys/FREENASSolaris(panic): zfs: allocating allocated segment(offset=0 size=1)
  Panic String: Solaris(panic): zfs: allocating allocated segment(offset=0 size=1)

  Dump Parity: 299023393
  Bounds: 1
  Dump Status: good

Any other information I can gather that would help troubleshoot the issue?

Associated revisions

Revision c58ddb13 (diff)
Added by Alexander Motin over 5 years ago

Close some potential races around socket start/close. There are some reports about panics on ic->ic_socket NULL derefence. This kind of races is the only way I can imagine it to happen. Ticket: #9721 (cherry picked from commit f6ca21866d0328404b7bff2b5e780a0dfe697546)

Revision c58ddb13 (diff)
Added by Alexander Motin over 5 years ago

Close some potential races around socket start/close. There are some reports about panics on ic->ic_socket NULL derefence. This kind of races is the only way I can imagine it to happen. Ticket: #9721 (cherry picked from commit f6ca21866d0328404b7bff2b5e780a0dfe697546)

Revision 3965cd70 (diff)
Added by Alexander Motin over 5 years ago

Close some potential races around socket start/close. There are some reports about panics on ic->ic_socket NULL derefence. This kind of races is the only way I can imagine it to happen. Ticket: #9721 (cherry picked from commit f6ca21866d0328404b7bff2b5e780a0dfe697546) (cherry picked from commit c58ddb130582c780a1cd9a4038effc1dd5c4b5cd)

Revision 3965cd70 (diff)
Added by Alexander Motin over 5 years ago

Close some potential races around socket start/close. There are some reports about panics on ic->ic_socket NULL derefence. This kind of races is the only way I can imagine it to happen. Ticket: #9721 (cherry picked from commit f6ca21866d0328404b7bff2b5e780a0dfe697546) (cherry picked from commit c58ddb130582c780a1cd9a4038effc1dd5c4b5cd)

Revision a0c3fa84 (diff)
Added by Alexander Motin over 5 years ago

Close some potential races around socket start/close. There are some reports about panics on ic->ic_socket NULL derefence. This kind of races is the only way I can imagine it to happen. Ticket: #9721 (cherry picked from commit f6ca21866d0328404b7bff2b5e780a0dfe697546) (cherry picked from commit c58ddb130582c780a1cd9a4038effc1dd5c4b5cd)

Revision a0c3fa84 (diff)
Added by Alexander Motin over 5 years ago

Close some potential races around socket start/close. There are some reports about panics on ic->ic_socket NULL derefence. This kind of races is the only way I can imagine it to happen. Ticket: #9721 (cherry picked from commit f6ca21866d0328404b7bff2b5e780a0dfe697546) (cherry picked from commit c58ddb130582c780a1cd9a4038effc1dd5c4b5cd)

History

#1 Updated by Aaron C de Bruyn over 5 years ago

  • Subject changed from Box crashes ~2 times per mont to Box crashes ~2 times per month

#2 Updated by Josh Paetzel over 5 years ago

Can you attach the textdump from /data/crash to this ticket please?

#3 Updated by Jordan Hubbard over 5 years ago

  • Category set to 21
  • Assignee set to Josh Paetzel

#4 Updated by Aaron C de Bruyn over 5 years ago

  • File textdump.tar.0.gz added

#5 Updated by Josh Paetzel over 5 years ago

  • Assignee changed from Josh Paetzel to Alexander Motin
  • Target version set to Unspecified

This is crashing in the iSCSI PDU code. Over to Alexander to investigate.

#6 Updated by Josh Paetzel over 5 years ago

  • Status changed from Unscreened to 15

A save debug would be useful too. The dmesg buffer is full of lost iSCSI connections that make it look like the network is unhappy.

Also the textdump you attached has an iscsi panic, however the info file you attached earlier has a ZFS crash. Is it possible there's two cxrashdumps in /data/crash?

#7 Updated by Aaron C de Bruyn over 5 years ago

info.0 contains:

Dump header from device /dev/dumpdev
  Architecture: amd64
  Architecture Version: 1
  Dump Length: 121344B (0 MB)
  Blocksize: 512
  Dumptime: Sun May 10 18:55:47 2015
  Hostname: nas1.crfr.local
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 9.3-RELEASE-p13 #4 r281084+c7bb047: Wed Apr 15 15:03:01 PDT 2015
    root@build3.ixsystems.com:/tank/home/jkh/build/FN/objs/os-base/amd64/tank/home/jkh/build/FN/FreeBSD/src/sys/FREENAS  Panic String: 
  Dump Parity: 2485925632
  Bounds: 0
  Dump Status: good

The initiator is Proxmox which uses open-iscsi. Unfortunately when you remove an iSCSI connection from the Proxmox interface, it hangs around until you actually reboot the box. We removed two machines from service a few days ago (along with their FreeNAS iSCSI connections), and Proxmox kept trying to connect. After the crash, we rebooted all the Proxmox boxes and now we aren't getting the errors.

I'm not familiar with gathering crash data from FreeBSD, so I'm not sure if I can get you the additional information after the box has been rebooted.

I can attach a tar of /data/crash if that would help.

#8 Updated by Josh Paetzel over 5 years ago

Yes, a tar of /data/crash would really help

#9 Updated by Aaron C de Bruyn over 5 years ago

It's 48 MB and Redmine only allows a 24.5 MB upload.

Try this: https://cdn.uithosting.com/crash.tar.bz2

#10 Updated by Jordan Hubbard over 5 years ago

  • Priority changed from No priority to Important

#11 Updated by Jordan Hubbard over 5 years ago

  • Status changed from 15 to Screened

#12 Updated by Alexander Motin over 5 years ago

  • Status changed from Screened to 19
  • Priority changed from Important to Nice to have
  • ChangeLog Entry updated (diff)

I've committed to nightly branch patch that should close potential race conditions around iSCSI connection start/close. I don't know whether it is the problem, but I don't see anything else that could cause that panic inside icl_send_thread().

#13 Updated by Alexander Motin over 5 years ago

  • Priority changed from Nice to have to Important

#14 Updated by Alexander Motin over 5 years ago

  • Status changed from 19 to Ready For Release

Build succeeded and iSCSI still working. Lets hope that reported problem was fixed.

#15 Updated by Jordan Hubbard over 5 years ago

  • Status changed from Ready For Release to Resolved

#16 Avatar?id=14398&size=24x24 Updated by Kris Moore about 4 years ago

  • Target version changed from Unspecified to N/A

#17 Updated by Dru Lavigne almost 3 years ago

  • File deleted (textdump.tar.0.gz)

Also available in: Atom PDF