Project

General

Profile

Bug #51478

LACP interface fails to start

Added by Phillip Smith about 1 year ago. Updated 10 months ago.

Status:
Closed
Priority:
No priority
Assignee:
Ryan Moeller
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
User Configuration Error
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

After upgrading from 11.2-beta2 to 11.2-beta3 yesterday, my LACP interface fails to start correctly on boot.

The output of ifconfig (attached) looks like the physical interfaces aren't attaching to lagg0 correctly (lack of flags=0 on the laggport lines), although I'm not overly familar with FreeBSD so not positive that's the problem.

The IP address is configured on lagg0, and the default route is in place, but ping returns "Host is down" when trying to ping the default gateway (ie, ARP fails).

Deleting and readding the default route did not fix.
Deleting and recreating the lagg did not fix.
Deleting the lagg and configuring one of the physical interfaces directly gets the network working again.

What other information can I provide to assist?

Hardware:
HP Microserver G8
Intel(R) Celeron(R) CPU G1610T @ 2.30GHz
8GB RAM

Network interfaces:
bge0: <Broadcom BCM5720 A0, ASIC rev. 0x5720000> mem 0xfabf0000-0xfabfffff,0xfabe0000-0xfabeffff,0xfabd0000-0xfabdffff irq 16 at device 0.0 on pci4
bge0: APE FW version: NCSI v1.1.15.0
bge0: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
miibus0: <MII bus> on bge0
bge0: Using defaults for TSO: 65518/35/2048
bge0: Ethernet address: 10:60:4b:92:39:d4
bge1: <Broadcom BCM5720 A0, ASIC rev. 0x5720000> mem 0xfabc0000-0xfabcffff,0xfabb0000-0xfabbffff,0xfaba0000-0xfabaffff irq 17 at device 0.1 on pci4
bge1: APE FW version: NCSI v1.1.15.0
bge1: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
miibus1: <MII bus> on bge1
bge1: Using defaults for TSO: 65518/35/2048
bge1: Ethernet address: 10:60:4b:92:39:d5

20181015_134129.jpg (4.96 MB) 20181015_134129.jpg Photo of ifconfig output Phillip Smith, 10/15/2018 03:08 PM
35002

History

#1 Updated by Dru Lavigne about 1 year ago

  • Private changed from No to Yes
  • Reason for Blocked set to Need additional information from Author

Phillip: please attach a debug (System -> Advanced -> Save debug) to this ticket.

#2 Updated by Phillip Smith about 1 year ago

  • File debug-181017.tgz added

Debug attached.

#3 Updated by Dru Lavigne about 1 year ago

  • Assignee changed from Release Council to Alexander Motin
  • Reason for Blocked deleted (Need additional information from Author)

#4 Updated by Alexander Motin about 1 year ago

  • Assignee changed from Alexander Motin to Ryan Moeller

Ryan, once you got to LAGG, take a look on this also please.

#5 Updated by Ryan Moeller about 1 year ago

  • Status changed from Unscreened to Blocked
  • Reason for Blocked set to Need additional information from Author

Phillip, can you get me a full ifconfig output with the bridge lagg configured? I'm interested in seeing the port interfaces too, not just lagg0. It might help to save the output to a text file so you don't have to upload multiple photos. Thanks!

#6 Updated by Phillip Smith about 1 year ago

Hi Ryan, I assume you mean the lagg, not the bridge? :)
I don't think I'll be able to any time soon though I'm sorry; this is a production machine (accidentally upgraded to the beta train due to issue #51335) at a radio station that I can't take offline without notice. I might be able to find 30 minutes this weekend to coordinate an outage, but can't promise anything.
Sorry about the photo; it was all I thought to grab at the time when I was trying to get the box back online.

#7 Updated by Ryan Moeller about 1 year ago

Whoops yes I meant lagg not bridge :)
Ok I will look around to see if anything else jumps out in the debug info. It does have an ifconfig printout of the lagg right after creation, but if you have a chance to get another shot of the ifconfig output after a minute it could help eliminate a few possibilities. Interfaces can take a while to start back up once a lagg is created, so there may be another clue in later output.

To confirm, your lagg works with the same hardware and config on 11.2-BETA2 but not BETA3?

#8 Updated by Phillip Smith about 1 year ago

Yes, the lagg was working fine on 11.2-BETA2.

I ran the upgrade to 11.2-BETA3 and rebooted; the box didn't come back online. It sat for over an hour in a booted state, but with the broken lagg (I was remote, had to drive to the site) so I think it had enough time to come up if it was just a slow initialization ;)

#9 Updated by Dru Lavigne 12 months ago

  • Status changed from Blocked to Screened
  • Reason for Blocked deleted (Need additional information from Author)

#10 Updated by Ryan Moeller 11 months ago

Hi Phillip, have you upgraded to the release, and if so did you have the same problem?

#11 Updated by Phillip Smith 11 months ago

Hi Ryan, I'll be updating that machine over the xmas period. I'll get back to you early in the new year if that's ok?

#12 Updated by Ryan Moeller 11 months ago

Ok, thanks :)

#13 Updated by Phillip Smith 10 months ago

So of course things never go plan, especially in an unpaid environment :)

I have scheduled maintenance this Saturday afternoon to update and test. Sorry I haven't gotten to it sooner.

#14 Updated by Ryan Moeller 10 months ago

Happy to have your cooperation nonetheless.

As an aside, I notice that you are using an HP Gen 8 Microserver. There are a lot of people reporting that they have been unable to boot with the new bootloader on that hardware. While the ticket has been closed in our tracker as we don't have the hardware to reproduce or work on it, there is a ticket for the issue in FreeBSD's tracker I am encouraging people to contribute to. If you could, please share your experiences/configuration with that hardware there: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232773

Thanks!

#15 Updated by Phillip Smith 10 months ago

Ouch, thanks for the heads up. I originally had problems with the G10 MicroServer (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221350) so had to use the G8 instead.

I have a spare G7 that I'll take with me as a last-resort backup in case the G8 develops the same boot issues.

#16 Updated by Ryan Moeller 10 months ago

I'd be surprised if you suddenly develop issues now, having been fine on 11.2-BETA2. Not a bad idea to be prepared anyway, though.

#17 Updated by Phillip Smith 10 months ago

OK, well I finally got this update done - good news, the machine still boots! Unfortunately the LACP issue is still present, however I'm suspecting it might be the switch the machine is connected to (Ubnt Unifi 24). I have no real evidence, and the problem did appear after the beta update when it was working fine before that, but it's just a gut feeling I have. Unfortunately I don't have any other switches on-site to test with.

So summary of what I did:

  1. Reboot
  2. Update to 11.2-RELEASE-U1 using `/usr/local/bin/freenas-update -v update`
  3. Reboot
  4. Delete network configuration from the console
  5. Create LACP lagg and configured IPv4 = unable to ping the local router
  6. Delete LACP lagg, create active-passive lagg = network working 100%

So for the time being I've left it as an active/passive lagg which at least gives me redundancy and I can live without the bandwidth of LACP for the time being.

When I can get a spare switch on-site, I'll try and test with a HP switch where I can configure it 'properly' without using graphical management tools like the Unifi.

I think for now you can probably close this bug off though; it doesn't seem anyone else has the issue, and I do suspect it's the switch rather than Free(NAS|BSD). If any more useful information comes to light I can always update and reopen?

#18 Updated by Ryan Moeller 10 months ago

  • Status changed from Screened to Closed
  • Target version deleted (Backlog)
  • Reason for Closing set to User Configuration Error
  • Needs Merging changed from Yes to No

Ok, feel free to add a note if your situation changes.

#19 Updated by Dru Lavigne 10 months ago

  • File deleted (debug-181017.tgz)

#20 Updated by Dru Lavigne 10 months ago

  • Target version set to N/A
  • Private changed from Yes to No

Also available in: Atom PDF