Project

General

Profile

Bug #4803

Solarflare driver trips an assert when LACP is used

Added by Josh Paetzel over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Nice to have
Assignee:
Josh Paetzel
Category:
-
Target version:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Insta-panic when trying to configure a solarflare 10Gbe car for LACP.

Textdump attached.

sj-storage/Dev/releng/FreeNAS/build_env/9.2.1.3-RELEASE/FreeBSD/src/sys/dev/sfxge/common

History

#1 Updated by Sean Fagan over 5 years ago

  • Assignee changed from Sean Fagan to Josh Paetzel

Well, that panic is apparently due to:

EFSYS_ASSERT3U(enp->en_mod_flags, &, EFX_MOD_PORT);

That seems to be due to a failed initialization.

There's a dtrace probe:

DTRACE_PROBE1(fail1, int, rc)

but I'm not sure what probe that actually maps to.

After the initialization fails, trying to use it results in the panic.

(Alternately, the driver was unloaded -- the bit is set in efx_port_init, and cleared on failure there; it's also cleared in efx_port_fini. It's also possible something is stomping it, but that seems less likely, doesn't it?)

I don't know how LAGG works; could it be trying to use an interface that has been unloaded or reset, or whose initialization failed?

#2 Updated by aurf alien over 5 years ago

I've confirmed this behavior in FreeBSD current 9.2 and current 10.0 releases. I'm currently filling a bug with FreeBSD.

#3 Updated by aurf alien over 5 years ago

I had time to really sit to find out whats going on. Using the card in any fashion, LACP or individual ports causes the system to core.

Using FreeNAS 9.1.1 yields no such issues in either LACP or individual ports mode. In other words, 9.1.1 works as expected were as 9.2.1.3 does not. What other info can I get you to help me fix this in 9.2.1.X?

#4 Updated by Anonymous over 5 years ago

There's work going in in the SolarFlare driver in FreeBSD -CURRENT right now. That work may or may not address this issue but I've brought this bug to the attention of the person working on it.

#5 Updated by aurf alien over 5 years ago

Thanks very much Doug, nice of you to let me know.

I've noticed that Solarflare released a new driver as of March 13 of this year.

So I've compiled it against the latest FreeBSD 9.2 and it does work as individually configured ports. LCAP still causes a panic however.

But in FreeNAS 9.2.1.3 is causes a panic even when trying to use the card as separately configured ports.

Is there any way some one would compile it against 9.2.1.3 or 9.2.1.4 and send the sfxge.ko and accompanying sfxge.ko.symbols files? I'd be very appreciative to say the least.

I assume that its not possible to use whats been compiled under the latest FreeBSD 9.2?

#6 Updated by aurf alien over 5 years ago

Andrew Rybchenko from the list has a patch that fixes this issue as he was able to repeat it. He will push it out to subversion after discussing it with Solarflare.

Once its pushed to subversion, how quickly can it be integrated into FreeNAS?

And is it possible for me to apply this patch on my systems w/o having to wait?

#7 Updated by Josh Paetzel over 5 years ago

  • Status changed from Unscreened to Screened

We'll get it in the next release. You won't be able to patch your system, however I'll get you a kernel you can swap in as soon as it's available so you have a fix now, then can pick up the mainline change with the next release.

#8 Updated by Josh Paetzel over 5 years ago

  • File sfxge-lag-fix.patch added

Attaching the proposed patch. Regardless of whether solarflare accepts it or not it looks to be correct. I'll roll a kernel that you can use while this gets sorted out.

#9 Updated by Josh Paetzel over 5 years ago

commit: trueos|7f4980b4c5c42a31acbecf96da7f45352cdb1400

#10 Updated by aurf alien over 5 years ago

Wow, that was fast, many thanks JP!

#11 Updated by Josh Paetzel over 5 years ago

  • File kernel added

I've attached a kernel with the fix. It should link with your existing modules, however if it doesn't I'll provide the whole /boot/kernel directory.

Go ahead and replace your existing kernel with the one attached.

#12 Updated by aurf alien over 5 years ago

Thanks very much for this. I'll implement tomorrow. Really impressed with how this all turned out!

#13 Updated by Josh Paetzel over 5 years ago

  • Status changed from Screened to Resolved
  • Target version changed from 49 to 92

9.2.1.4.1 has the fix as well, if you just want to try an entire FreeNAS image.

#14 Updated by aurf alien over 5 years ago

The patch as is was pushed out to subversion w/o any further mods to it.

#15 Updated by Jordan Hubbard over 5 years ago

  • Target version changed from 92 to 9.2.1.5-RELEASE

#16 Updated by Dru Lavigne almost 2 years ago

  • File deleted (solarflare.crash.zip)

#17 Updated by Dru Lavigne almost 2 years ago

  • File deleted (sfxge-lag-fix.patch)

#18 Updated by Dru Lavigne almost 2 years ago

  • File deleted (kernel)

Also available in: Atom PDF