Project

General

Profile

Bug #37786

Remove double free which caused bhyve to SIGBUS

Added by Greg Fitzgerald about 2 years ago. Updated about 2 years ago.

Status:
Done
Priority:
No priority
Assignee:
Marcelo Araujo
Category:
OS
Target version:
Seen in:
Severity:
High
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

My Bhyve VM's keep crashing with signal 10 since the beta update. It doesn't matter if I'm running just the 512MB dns one or all three they keep crashing. Only started happening since I switched to the beta.


Subtasks

Bug #37842: VM suddenly stopping after an hour or so. ( uid 0: exited on signal 10 )ClosedMarcelo Araujo

Related issues

Related to FreeNAS - Bug #34747: Bhyve process exitsClosed

Associated revisions

Revision 6a69efc4 (diff)
Added by Marcelo Araujo about 2 years ago

- Update to 0.1.4.2. Changelog: - Remove double free. On vnc_init_server() we free already the struct server_softc *sc, we don't need free it on bhyve/vncserver.c. This was probably the root cause of a SIGBUS. Ticket: #37842, #37786 and possible #34747

Revision 7e9aa18a (diff)
Added by Marcelo Araujo about 2 years ago

- Remove double free. On libhyve-remote function vnc_init_server() we free already the struct server_softc *sc, we don't need free it on bhyve/vncserver.c. This was probably the root cause of a SIGBUS. Ticket: #37842, #37786 and possible #34747

Revision abdf3953 (diff)
Added by Marcelo Araujo over 1 year ago

- Remove double free. On libhyve-remote function vnc_init_server() we free already the struct server_softc *sc, we don't need free it on bhyve/vncserver.c. This was probably the root cause of a SIGBUS. Ticket: #37842, #37786 and possible #34747 (cherry picked from commit 7e9aa18a8eb32ef9b5f426e86ca0685b18152f0b)

History

#1 Updated by Greg Fitzgerald about 2 years ago

  • File debug-freenas-20180712002308.txz added
  • Private changed from No to Yes

#2 Updated by Marcelo Araujo about 2 years ago

  • Status changed from Unscreened to In Progress
  • Assignee changed from Release Council to Marcelo Araujo

#3 Updated by Marcelo Araujo about 2 years ago

  • Severity changed from New to High
  • Needs Doc changed from Yes to No
  • Needs Merging changed from Yes to No

#4 Updated by Marcelo Araujo about 2 years ago

I'm with access to Greg's box to analysis what is the reason for this crash.

#5 Updated by Marcelo Araujo about 2 years ago

Marcelo Araujo wrote:

I'm with access to Greg's box to analysis what is the reason for this crash.

Hi Greg,

Spent all day looking into your machine, I could see the vm crash with SIGBUS. While I was investigating this issue I noticed you have lots of errors like this one:

sonewconn: pcb 0xfffff8013195f570: Listen queue overflow: 151 already in queue awaiting acceptance (4 occurrences)

A total of:
root@freenas:~ # dmesg | grep sonewconn | wc -l
3419

It can be caused by your Broadcom NIC, however I'm still not 100% sure about that.

What I did few hours ago was set this sysctl: sysctl kern.ipc.soacceptqueue=4096

By default the value is 128 that is pretty low for some expressive network traffic, I did up this value to 4096 and restarted some services such like: mdnsd and netatalk as well as your 3 vms.

What I would suggest for you would be to set this systctl at System->Tunables and reboot your FreeNAS.

Launch again your 3 vms and let me know if that solves the VM crashes.
I'm running your 3 vms for over 3 hours already without a crash.

Best,

#6 Updated by Marcelo Araujo about 2 years ago

  • Reason for Blocked set to Waiting for feedback
  • Needs Doc changed from No to Yes

#7 Updated by Greg Fitzgerald about 2 years ago

Marcelo Araujo wrote:

Marcelo Araujo wrote:

I'm with access to Greg's box to analysis what is the reason for this crash.

Hi Greg,

Spent all day looking into your machine, I could see the vm crash with SIGBUS. While I was investigating this issue I noticed you have lots of errors like this one:

sonewconn: pcb 0xfffff8013195f570: Listen queue overflow: 151 already in queue awaiting acceptance (4 occurrences)

A total of:
root@freenas:~ # dmesg | grep sonewconn | wc -l
3419

It can be caused by your Broadcom NIC, however I'm still not 100% sure about that.

What I did few hours ago was set this sysctl: sysctl kern.ipc.soacceptqueue=4096

By default the value is 128 that is pretty low for some expressive network traffic, I did up this value to 4096 and restarted some services such like: mdnsd and netatalk as well as your 3 vms.

What I would suggest for you would be to set this systctl at System->Tunables and reboot your FreeNAS.

Launch again your 3 vms and let me know if that solves the VM crashes.
I'm running your 3 vms for over 3 hours already without a crash.

Best,

Thank you for spending the time debugging this. I set the sysctl value and rebooted. I'll let you know if they crash again.

#8 Updated by Greg Fitzgerald about 2 years ago

  • File debug.tgz added

I was up all night working, I woke up at 2:30PM EST and my VM's had crashed again. I attached the debug.tgz.

#9 Updated by Marcelo Araujo about 2 years ago

Greg Fitzgerald wrote:

I was up all night working, I woke up at 2:30PM EST and my VM's had crashed again. I attached the debug.tgz.

Hello Greg,

I have connected in your machine and the kern.ipc.soacceptqueue is still 128.

root@freenas:~ # sysctl kern.ipc.soacceptqueue
kern.ipc.soacceptqueue: 128

Did you roll it back?

#10 Updated by Dru Lavigne about 2 years ago

  • Target version changed from Backlog to 11.2-BETA2

#12 Updated by Greg Fitzgerald about 2 years ago

Yes, when I setup the sysctl in the gui I had it set as loader instead of sysctl. When I booted back up I failed to verify that it was set correctly. I have since fixed it, VMs are still crashing regularly.

The fix has increased my network throughput on NFS shares by a lot though. Not sure the exact numbers. More than doubled. I wonder if this option could be set with the autotune when a realtek card is detected in the system?

#14 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug-freenas-20180712002308.txz)

#15 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug.tgz)

#16 Updated by Dru Lavigne about 2 years ago

  • Subject changed from Bhyve VM's Crashing to Remove double free which caused iocage to SIGBUS
  • Private changed from Yes to No
  • Needs Doc changed from Yes to No
  • Needs Merging changed from No to Yes

#17 Updated by Dru Lavigne about 2 years ago

  • Status changed from In Progress to Ready for Testing
  • Needs Merging changed from Yes to No

#18 Updated by Dru Lavigne about 2 years ago

  • Reason for Blocked deleted (Waiting for feedback)

#19 Updated by Marcelo Araujo about 2 years ago

  • Subject changed from Remove double free which caused iocage to SIGBUS to Remove double free which caused bhyve to SIGBUS

#20 Updated by Dru Lavigne about 2 years ago

  • Related to Bug #34747: Bhyve process exits added

#21 Updated by Joe Maloney about 2 years ago

  • Status changed from Ready for Testing to Passed Testing

#22 Updated by Dru Lavigne about 2 years ago

  • Status changed from Passed Testing to Done
  • Needs QA changed from Yes to No

Also available in: Atom PDF