Project

General

Profile

Bug #24350

Windows VMs lockup under Bhyve

Added by Phil Kahnk over 3 years ago. Updated about 3 years ago.

Status:
Closed: Third party to resolve
Priority:
Important
Assignee:
Marcelo Araujo
Category:
Middleware
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Hardware:
Dell C2100, 2x L5640, 48GB ECC, 2x H310 flashed to IT, 8x 3TB WD RED Raidz2, 2x mirrored SSD for VMs and Jails, FreeNAS booted from USB.

When running Windows 10 or Windows Server 2012 (only two Windows OS's I have tried) on Bhyve in FreeNAS 11-RC3, the VM will lock up/freeze and display roughly 100-110% cpu in 'TOP' on the host. CentOS 7 seems to runs fine.

When the VM locks up or freezes, the VNC session is not disconnected from the VM but is non-responsive. If I happen to be RDP'd into the VM I am disconnected. Ping responses stop. The only way to stop the VM is to 'kill -9' the PID. When the OS is restarted there are no crash logs or events shown other than a forced power off type event. Maybe Bhyve stops communication with the zvol disk for the VM?

When I first had this issue the VMs would freeze within 3-5 minutes of boot of the VM (after install). I went into the BIOS and disabled hyperthreading and was able to keep the VMs running for hours as long as they were at idle. As soon as I ran an iperf test or loaded the VMs down with processes (windows updates) they would freeze and required a 'kill -9' before booting back up.

This seems like a tricky issue so please let me know if I can provide more information or if you need to reach into my system to troubleshoot directly.

Windows10_CPU.PNG (6.67 KB) Windows10_CPU.PNG Phil Kahnk, 06/03/2017 10:10 PM
Windows10_CPU_2.PNG (20.9 KB) Windows10_CPU_2.PNG Phil Kahnk, 06/03/2017 10:10 PM
11348
11349

History

#1 Updated by Phil Kahnk over 3 years ago

  • File debug-bc2100-20170604001026.txz added

#2 Updated by Phil Kahnk over 3 years ago

11348

#3 Updated by Phil Kahnk over 3 years ago

11349

#4 Updated by Marcelo Araujo over 3 years ago

  • Status changed from Unscreened to 15

Hi,

Thanks for the report. I'm wondering if you are using virtio-net or e1000 as NIC for the guest, could you tell me which one you are using?

If it is e1000, there might be an unknown bug with e1000 driver, it is not an easy driver to debug, but as much as we can replicate it we might have a good start.

#5 Updated by Phil Kahnk over 3 years ago

I was using the e1000 driver.
I have pulled the following drivers from
https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/archive-virtio/

virtio-win-0.1.96.iso
virtio-win-0.1.118.iso (for windows 10)

I will report back on how it functions.

#6 Updated by Phil Kahnk over 3 years ago

If anyone finds this in the future. To apply the drivers, I changed the NIC type to VIRTIO and then created a CD-rom with the ISO I downloaded from above. I booted the VM and opened 'device manager' and found the ethernet controller that was missing the driver, right click and install. Point the install to the CD-rom drive and that's it.

The VMs I was having problems with have been running clean for a couple hours. I will let them run over night to confirm but it's looking good.

Nice find on that previous bug report.

Drivers I used:
Windows 10 - virtio-win-0.1.118.iso
Others Windows OS - virtio-win-0.1.96.iso
Newest - virtio-win-0.1.137.iso

0.1.137 seemed to allow for higher throughput on Windows Server 2012.

#7 Updated by Marcelo Araujo over 3 years ago

Phil Kahnk wrote:

If anyone finds this in the future. To apply the drivers, I changed the NIC type to VIRTIO and then created a CD-rom with the ISO I downloaded from above. I booted the VM and opened 'device manager' and found the ethernet controller that was missing the driver, right click and install. Point the install to the CD-rom drive and that's it.

The VMs I was having problems with have been running clean for a couple hours. I will let them run over night to confirm but it's looking good.

Nice find on that previous bug report.

Drivers I used:
Windows 10 - virtio-win-0.1.118.iso
Others Windows OS - virtio-win-0.1.96.iso
Newest - virtio-win-0.1.137.iso

0.1.137 seemed to allow for higher throughput on Windows Server 2012.

Thanks for the feedback! Just to let you know, I can reproduce the guest crash very easy using iperf and I'm investigating this issue, seems the e1000 interface has some issue, not clear for me yet.

Let me know if with virtio-net the problem goes away for you.

Best,

#8 Updated by Marcelo Araujo over 3 years ago

  • Status changed from 15 to Investigation

#9 Updated by Phil Kahnk over 3 years ago

The virtio driver seems to have cleared up the problem I was having with the VMs locking up. You can close this bug since there is a work around, unless you want to keep it open for your own tracking if you are debugging the e1000 driver.

Thanks for your help!

-Phil Kahnk

#10 Updated by Marcelo Araujo over 3 years ago

  • Priority changed from No priority to Important
  • Target version set to 11.1

Phil Kahnk wrote:

The virtio driver seems to have cleared up the problem I was having with the VMs locking up. You can close this bug since there is a work around, unless you want to keep it open for your own tracking if you are debugging the e1000 driver.

Thanks for your help!

-Phil Kahnk

Thanks Phil,

I will keep this ticket open, it is an issue with e1000 driver that I'm investigating, it will cost a bit of time till it get fixed first on FreeBSD and then on FreeNAS.

Thank you for all tests and report.

Best,

#11 Avatar?id=14398&size=24x24 Updated by Kris Moore about 3 years ago

  • Seen in changed from Unspecified to N/A

#12 Updated by Marcelo Araujo about 3 years ago

  • Status changed from Investigation to Closed: Third party to resolve

It is an upstream problem, there is no defined solution yet.
So I'm closing this ticket as we have something similar report on FreeBSD bugzilla.

#13 Updated by Dru Lavigne about 3 years ago

  • Target version changed from 11.1 to N/A

#14 Updated by Dru Lavigne about 3 years ago

  • File deleted (debug-bc2100-20170604001026.txz)

#15 Updated by Dru Lavigne about 3 years ago

  • Private changed from Yes to No

Also available in: Atom PDF