LACP interface fails to start
After upgrading from 11.2-beta2 to 11.2-beta3 yesterday, my LACP interface fails to start correctly on boot.
The output of ifconfig (attached) looks like the physical interfaces aren't attaching to lagg0 correctly (lack of flags=0 on the laggport lines), although I'm not overly familar with FreeBSD so not positive that's the problem.
The IP address is configured on lagg0, and the default route is in place, but ping returns "Host is down" when trying to ping the default gateway (ie, ARP fails).
Deleting and readding the default route did not fix.
Deleting and recreating the lagg did not fix.
Deleting the lagg and configuring one of the physical interfaces directly gets the network working again.
What other information can I provide to assist?
HP Microserver G8
Intel(R) Celeron(R) CPU G1610T @ 2.30GHz
bge0: <Broadcom BCM5720 A0, ASIC rev. 0x5720000> mem 0xfabf0000-0xfabfffff,0xfabe0000-0xfabeffff,0xfabd0000-0xfabdffff irq 16 at device 0.0 on pci4
bge0: APE FW version: NCSI v220.127.116.11
bge0: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
miibus0: <MII bus> on bge0
bge0: Using defaults for TSO: 65518/35/2048
bge0: Ethernet address: 10:60:4b:92:39:d4
bge1: <Broadcom BCM5720 A0, ASIC rev. 0x5720000> mem 0xfabc0000-0xfabcffff,0xfabb0000-0xfabbffff,0xfaba0000-0xfabaffff irq 17 at device 0.1 on pci4
bge1: APE FW version: NCSI v18.104.22.168
bge1: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
miibus1: <MII bus> on bge1
bge1: Using defaults for TSO: 65518/35/2048
bge1: Ethernet address: 10:60:4b:92:39:d5
#5 Updated by Ryan Moeller about 1 year ago
- Status changed from Unscreened to Blocked
- Reason for Blocked set to Need additional information from Author
Phillip, can you get me a full ifconfig output with the
bridge lagg configured? I'm interested in seeing the port interfaces too, not just lagg0. It might help to save the output to a text file so you don't have to upload multiple photos. Thanks!
#6 Updated by Phillip Smith about 1 year ago
Hi Ryan, I assume you mean the lagg, not the bridge? :)
I don't think I'll be able to any time soon though I'm sorry; this is a production machine (accidentally upgraded to the beta train due to issue #51335) at a radio station that I can't take offline without notice. I might be able to find 30 minutes this weekend to coordinate an outage, but can't promise anything.
Sorry about the photo; it was all I thought to grab at the time when I was trying to get the box back online.
#7 Updated by Ryan Moeller about 1 year ago
Whoops yes I meant lagg not bridge :)
Ok I will look around to see if anything else jumps out in the debug info. It does have an ifconfig printout of the lagg right after creation, but if you have a chance to get another shot of the ifconfig output after a minute it could help eliminate a few possibilities. Interfaces can take a while to start back up once a lagg is created, so there may be another clue in later output.
To confirm, your lagg works with the same hardware and config on 11.2-BETA2 but not BETA3?
#8 Updated by Phillip Smith about 1 year ago
Yes, the lagg was working fine on 11.2-BETA2.
I ran the upgrade to 11.2-BETA3 and rebooted; the box didn't come back online. It sat for over an hour in a booted state, but with the broken lagg (I was remote, had to drive to the site) so I think it had enough time to come up if it was just a slow initialization ;)
Happy to have your cooperation nonetheless.
As an aside, I notice that you are using an HP Gen 8 Microserver. There are a lot of people reporting that they have been unable to boot with the new bootloader on that hardware. While the ticket has been closed in our tracker as we don't have the hardware to reproduce or work on it, there is a ticket for the issue in FreeBSD's tracker I am encouraging people to contribute to. If you could, please share your experiences/configuration with that hardware there: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232773
Ouch, thanks for the heads up. I originally had problems with the G10 MicroServer (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221350) so had to use the G8 instead.
I have a spare G7 that I'll take with me as a last-resort backup in case the G8 develops the same boot issues.
OK, well I finally got this update done - good news, the machine still boots! Unfortunately the LACP issue is still present, however I'm suspecting it might be the switch the machine is connected to (Ubnt Unifi 24). I have no real evidence, and the problem did appear after the beta update when it was working fine before that, but it's just a gut feeling I have. Unfortunately I don't have any other switches on-site to test with.
So summary of what I did:
- Update to 11.2-RELEASE-U1 using `/usr/local/bin/freenas-update -v update`
- Delete network configuration from the console
- Create LACP lagg and configured IPv4 = unable to ping the local router
- Delete LACP lagg, create active-passive lagg = network working 100%
So for the time being I've left it as an active/passive lagg which at least gives me redundancy and I can live without the bandwidth of LACP for the time being.
When I can get a spare switch on-site, I'll try and test with a HP switch where I can configure it 'properly' without using graphical management tools like the Unifi.
I think for now you can probably close this bug off though; it doesn't seem anyone else has the issue, and I do suspect it's the switch rather than Free(NAS|BSD). If any more useful information comes to light I can always update and reopen?