Project

General

Profile

Bug #27678

FreeNAS 11.1 RELEASE and Intel 10gb NIC issue

Added by Ray Milyard over 2 years ago. Updated over 2 years ago.

Status:
Closed: Third party to resolve
Priority:
No priority
Assignee:
Alexander Motin
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

So when I tired to upgrade to 11.1 it failed. I rolled back to 11.0-U4. Now I will not let me update any plugins. However the main issue was after system reboots my Intel X520-DA1-10 Gbe-Single Port comes up with right IP. I can ping it for about 15-30 secs then I can't . So I can't login to web console or anything to see what is going on.

Also today I did a FRESH install of 11.1. All my 1gig NICs are fine but have same issue with the 10gb.

Any ideas?

History

#1 Updated by Ray Milyard over 2 years ago

  • File debug-freenas-20180108073233.txz added

#2 Updated by Ray Milyard over 2 years ago

  • File debug.tar.gz added

#3 Updated by Dru Lavigne over 2 years ago

  • Assignee changed from Release Council to Alexander Motin
  • Target version set to 11.2-BETA1
  • Seen in changed from Unspecified to 11.1

#4 Updated by Alexander Motin over 2 years ago

  • Status changed from Unscreened to Screened

Looking on provided debug, I see such a line in ifconfig output:

 RX: 0.00 mW (-40.00 dBm) TX: 6.55 mW (8.16 dBm)

I am curios, whether there indeed no receive signal, or it is only a reporting artifact. It would be good to look on it when NIC is working and when it is not. Could you try to unplug and replug the cable or SFP module in case that recover it somehow?

Could you try to take a `tcpdump -pvni ix0` output, when it is not working, while pinging some other host from the FreeNAS?

#5 Updated by Ray Milyard over 2 years ago

I am new to FreeNAS etc so not sure what all want me to do. From console I can ping 10.0.1.100, 101 and 102. If try access via web I can’t. Also if ping something like www.apple.com I get cannot resolve error so DNS not working seems also.

#6 Updated by Ray Milyard over 2 years ago

  • File tcpdump.txt added

Not sure if this file will help at all.

Also noticed the green link light on card isn’t on.

#7 Updated by Ray Milyard over 2 years ago

So trying to watch system boot it looks like the Green Link light flashes till gets to part that jails start in FreeNAS. I have 2. 10.0.1.101 and 102.

#8 Updated by Alexander Motin over 2 years ago

  • Status changed from Screened to 15

Don't you have any port security limits on your switch that could kill the port after appearance of new extra MAC addresses on it? Is the switch managed? What does it tell about the port status?

#9 Updated by Ray Milyard over 2 years ago

Alexander Motin wrote:

Don't you have any port security limits on your switch that could kill the port after appearance of new extra MAC addresses on it? Is the switch managed? What does it tell about the port status?

Switch is HP 2920-24G Switch (J9726A)

Status when ping stops:
Port Name:
A1

Enabled:
Down

Type:
SFP+SR

Utilization:
Receive

Total (kbps) :
0

Unicast (pps):
0

B/Mcast (pps):
0

Utilization %:
0

:

Recv Discards:
0

Unknown Protos:
0

Out Queue Len :
0

Transmit

0

0

0

0

Totals:
Receive

Bytes :
2355930659

Unicast :
1782695456

Bcast/Mcast :
25840

Errors:
Receive

FCS:
0

Alignment:
0

Runts:
0

Giants:
0

Total Errors:
0

Drops:

Collisions:

Late Collns:

Excess Collns:

Deferred:

Transmit

2778349841

281528229

408483

Shows this for security:

Security Policy
Port(s): A1

Learn Mode:
Continuous

Address Limit:
1

Violation Action:
None

#10 Updated by Alexander Motin over 2 years ago

I suppose here is the problem:

Address Limit:
1

Every jail consume additional MAC address, so your observation that the link goes down when jail start may be valid. Not sure though how could it be related to FreeNAS update.

#11 Updated by Ray Milyard over 2 years ago

Alexander Motin wrote:

I suppose here is the problem:
[...]

Every jail consume additional MAC address, so your observation that the link goes down when jail start may be valid. Not sure though how could it be related to FreeNAS update.

Don’t seem to be able to change that. It’s defaulted.

Also if that is issue why does it work fine in 11.U4?

#12 Updated by Dru Lavigne over 2 years ago

  • Category changed from 1 to 38
  • Status changed from 15 to Screened

#13 Updated by Ray Milyard over 2 years ago

After an hour on phone with HP support they said switch setup in default like it is should work. They verified it works fine under 11.U4. When install 11.1 and as soon as we install a plugin the port shutdown. They said they believe it’s issue with current FreeNAS build or driver using. At this point not sure how to go about addressing this.

#14 Updated by Ray Milyard over 2 years ago

Found something else out today. So my board has 2 intel NICs. If I use nic1 and remove ix0 10gb nic from boot console it works. As so as I add ix0 either give IP or DHCP stops working. I can no longer access web via the nic1

#15 Updated by Alexander Motin over 2 years ago

Ray Milyard wrote:

As so as I add ix0 either give IP or DHCP stops working. I can no longer access web via the nic1

Shall I expect you know how IP addressing/routing works and you are not trying to assign IPs from the same subnet to different NICs same time? Otherwise I would not expect that addition of another NIC would affect the existing one(s).

#16 Updated by Alexander Motin over 2 years ago

Googling about this switch made me think that I could be wrong about the "Address Limit", since it may just not apply when "Learn mode" is set to "Continuous", but not "Limited-continuous". It was good try, but no. :(

My only other guess for now is that something goes wrong when bridge driver attaching to the NIC changes its options, possibly reinitializing it wrong. If it is so, unfortunately I have no hardware to reproduce that, neither I can guess it theoretically. It could give some info if you could disable the jails or move them to some other interface, and then, when the ix0 interface supposedly recovers, try to manually create a bridge for it live from command line with commands like:

ifconfig bridge5 create
ifconfig bridge5 addm ix0
ifconfig tap0 create
ifconfig bridge5 addm tap0

, and see what happen with the connectivity, if anything, and at which point. If that reproduce the problem, it can be sufficient input to talk to the driver developers in FreeBSD.

#17 Updated by Alexander Motin over 2 years ago

  • Status changed from Screened to 15

#18 Updated by Ray Milyard over 2 years ago

Alexander Motin wrote:

Googling about this switch made me think that I could be wrong about the "Address Limit", since it may just not apply when "Learn mode" is set to "Continuous", but not "Limited-continuous". It was good try, but no. :(

My only other guess for now is that something goes wrong when bridge driver attaching to the NIC changes its options, possibly reinitializing it wrong. If it is so, unfortunately I have no hardware to reproduce that, neither I can guess it theoretically. It could give some info if you could disable the jails or move them to some other interface, and then, when the ix0 interface supposedly recovers, try to manually create a bridge for it live from command line with commands like:
[...]
, and see what happen with the connectivity, if anything, and at which point. If that reproduce the problem, it can be sufficient input to talk to the driver developers in FreeBSD.

I will give that a try in the morning. Also got a Chelsio 2 port 10gb from eBay. Be here in a few days. Was thinking trying that instead of the Intel 520 using now and see if any different.

Was also thinking does 10gb really help since only have it from FreeNAS server to switch. Rest of network is gigabit.

#19 Updated by Alexander Motin over 2 years ago

Ray Milyard wrote:

Was also thinking does 10gb really help since only have it from FreeNAS server to switch. Rest of network is gigabit.

That may only help in cases is many clients are accessing the server at the very same time. Quite a lot of people are trying to use LACP for that purpose, but quite few really have environment where that can benefit.

#20 Updated by Ray Milyard over 2 years ago

Alexander Motin wrote:

Googling about this switch made me think that I could be wrong about the "Address Limit", since it may just not apply when "Learn mode" is set to "Continuous", but not "Limited-continuous". It was good try, but no. :(

My only other guess for now is that something goes wrong when bridge driver attaching to the NIC changes its options, possibly reinitializing it wrong. If it is so, unfortunately I have no hardware to reproduce that, neither I can guess it theoretically. It could give some info if you could disable the jails or move them to some other interface, and then, when the ix0 interface supposedly recovers, try to manually create a bridge for it live from command line with commands like:
[...]
, and see what happen with the connectivity, if anything, and at which point. If that reproduce the problem, it can be sufficient input to talk to the driver developers in FreeBSD.

When try ifconfig bridge5 addm ix0
I get ix0 is busy error.

Today I did fresh 11.1 install in new USB. Only cable plugged into server it the 10gb DAC. I get all setup and running. I create volume on just my SSD called jails. I then config so my jails will be on /mnt/jails which I just created. I rebooted to just be safe. When in web console after reboot. I start pinging FreeNAS box on other computer and ping it fine. On web console I go to plugins and start installing Sabnzbd. I starts to install. At setp 5 or 6 which says creating jail the ping stops. I can't get into web now unless I delete the ix0 interface and add em0 or em1 (1gb) NICS.

If needed someone is welcome to setup time to remote with me to look at this issue.

#21 Updated by Dru Lavigne over 2 years ago

  • Status changed from 15 to Investigation

#22 Updated by Alexander Motin over 2 years ago

Ray Milyard wrote:

When try ifconfig bridge5 addm ix0
I get ix0 is busy error.

I guess you've done that on system with jails already set up? That would explain, since you probably can't attach two bridged to one interface. It would be interesting to try that on a fresh system without jails yet.

Today I did fresh 11.1 install in new USB. ... At setp 5 or 6 which says creating jail the ping stops. I can't get into web now unless I delete the ix0 interface and add em0 or em1 (1gb) NICS.

This pretty much confirms that that problem is somehow related to jails or (I believe) to the bridge interface providing networking to them.

If needed someone is welcome to setup time to remote with me to look at this issue.

We could possibly do it some time next week. But I have one more switch-related idea first: do you have some form of STP enabled on your switch? Since the software bridge supports STP/RSTP protocols, I guess: can't that be the source of confusion for the switch? If you are not using STP on your network, could you check that STP protocol is disabled on the switch, or at least on the port, to make sure it is not related to issue?

#23 Updated by Ray Milyard over 2 years ago

Alexander Motin wrote:

Ray Milyard wrote:

When try ifconfig bridge5 addm ix0
I get ix0 is busy error.

I guess you've done that on system with jails already set up? That would explain, since you probably can't attach two bridged to one interface. It would be interesting to try that on a fresh system without jails yet.

Today I did fresh 11.1 install in new USB. ... At setp 5 or 6 which says creating jail the ping stops. I can't get into web now unless I delete the ix0 interface and add em0 or em1 (1gb) NICS.

This pretty much confirms that that problem is somehow related to jails or (I believe) to the bridge interface providing networking to them.

If needed someone is welcome to setup time to remote with me to look at this issue.

We could possibly do it some time next week. But I have one more switch-related idea first: do you have some form of STP enabled on your switch? Since the software bridge supports STP/RSTP protocols, I guess: can't that be the source of confusion for the switch? If you are not using STP on your network, could you check that STP protocol is disabled on the switch, or at least on the port, to make sure it is not related to issue?

My switch is pretty much defaulted as comes NEW. I haven't really turned anything on or off etc.

However since switch works fine with boot into 11.U4 not sure really issue with setup on switch. Which using U4 I can use 10gb or 1gb etc. I can get working with all 3 NICs.

Now install 11.1 on clean system. Same. All NICs will work and runs fine. So I take a clean SSD and setup volume and just name is jails. This is how I plan to run all my media/plex plugins on the SSD with volume jails. The moment the install for plugin gets to step 5/6 Creating Jails which I believe would be making the sabnzbs_1 jail all goes out.

#24 Updated by Alexander Motin over 2 years ago

I am not telling that problem is in your switch. And it is quite likely that something has changed on FreeNAS side, since that is what the new versions are usually about. But it does not necessary mean that the problem is on FreeNAS side either, it may be just a configuration incompatibility. That is the reason why I am asking you to do some unusual things -- to see whether it change anything in more controllable environment. I understood you when you said that recreation jails on the newly installed system also kills the link, but since it (supposedly) did not affect many other users, there must be something specific enough in your specific configuration. Yes, it can be the ix NIC model, and then problem will have to be addressed to the respective driver developers, or it can be related to some network configuration specifics, and then may be we'll be able to solve that faster. Please, try to run the commands I proposed on freshly installed 11.1 without jails.

#25 Updated by Ray Milyard over 2 years ago

So fresh install?

When create volume?

run:
ifconfig bridge5 create
ifconfig bridge5 addm ix0
ifconfig tap0 create
ifconfig bridge5 addm tap0

Then try installing plugin?

#26 Updated by Alexander Motin over 2 years ago

Ray Milyard wrote:

So fresh install?

Yes

When create volume?

No need for a volume, just configured networking.

run:
ifconfig bridge5 create
ifconfig bridge5 addm ix0
ifconfig tap0 create
ifconfig bridge5 addm tap0

Then try installing plugin?

No need for real plugins, just run above commands to simulate in very minimal part what is done on plugin creation. If the problem is indeed related to network bridging, as I guess, you likely won't be able to complete this command set before you loose the connectivity. If not -- that is also an information.

#27 Updated by Alexander Motin over 2 years ago

If that won't trigger the issue, please reboot to clean it and then try to create some plugin, but with getting debug information before and after it, so we could compare what has changed.

#28 Updated by Ray Milyard over 2 years ago

Alexander Motin wrote:

Ray Milyard wrote:

So fresh install?

Yes

When create volume?

No need for a volume, just configured networking.

run:
ifconfig bridge5 create
ifconfig bridge5 addm ix0
ifconfig tap0 create
ifconfig bridge5 addm tap0

Then try installing plugin?

No need for real plugins, just run above commands to simulate in very minimal part what is done on plugin creation. If the problem is indeed related to network bridging, as I guess, you likely won't be able to complete this command set before you loose the connectivity. If not -- that is also an information.

Ok I ran commands and they ran. After that I could install plugin on jail volume and it also pinged. All looked good other than all the stuff I need to install/setup.

Reboot and back to issue. Guessing commands don't stay after reboot.

#29 Updated by Alexander Motin over 2 years ago

Those commands were not supposed to persist, it was only a test, and since it didn't kill the link, it failed. Have you rebooted after running the commands before installing plugin as I have told? You are saying that plugin was working fine when created live, but connection died upon reboot?

#30 Updated by Ray Milyard over 2 years ago

Alexander Motin wrote:

Those commands were not supposed to persist, it was only a test, and since it didn't kill the link, it failed. Have you rebooted after running the commands before installing plugin as I have told? You are saying that plugin was working fine when created live, but connection died upon reboot?

Ok ran ping from other pc to server. DHCP ip is 10.0.1.70. Started pinging it

On console ran the 4 commands. Ping still is fine. Let go few minutes. Reboot server and after come back ping on 10.0.1.70 still fine.

#31 Updated by Ray Milyard over 2 years ago

Alexander Motin wrote:

Those commands were not supposed to persist, it was only a test, and since it didn't kill the link, it failed. Have you rebooted after running the commands before installing plugin as I have told? You are saying that plugin was working fine when created live, but connection died upon reboot?

After reboot if I try to create the jail/plugin without running those commands again the link dies. If I do the commands then create jail/plugin it seems to work. However after I reboot after doing this then comes back up then dies as soon as the jail IP starts.

#32 Updated by Ray Milyard over 2 years ago

Anything else need me to try? Also I will have new Chelsio NIC tomorrow and other Tuesday that could see if same issue.

#33 Updated by Alexander Motin over 2 years ago

Test with Chelsio would be a great indicator.

#34 Updated by Ray Milyard over 2 years ago

Alexander Motin wrote:

Test with Chelsio would be a great indicator.

So today I put this 110-1114-30 Chelsio PCI-E CC2-S320E-SR 10GbE Dual Port SFP+ Network Card into server and removed the Intel one.

Fresh install of 11.1 and looks like it works fine now. I did make the jail and install the sabznbd plugin fix. I will be rebuilding it all tomorrow so I will know more but so far so good!

#35 Updated by Alexander Motin over 2 years ago

So you are saying this problem is Intel-specific and not reproducible with Chelsio NIC? Or you are still running some tests?

#36 Updated by Ray Milyard over 2 years ago

Alexander Motin wrote:

So you are saying this problem is Intel-specific and not reproducible with Chelsio NIC? Or you are still running some tests?

I looks like it's issue with the Intel card I was using. With this Chelsio I am not having this issue. I have setup them the same way.

#37 Updated by Alexander Motin over 2 years ago

  • Status changed from Investigation to Closed: Third party to resolve

OK. Then it sounds like it is not some general higher-level FreeNAS-specific issue, but either specific to specific card or its driver, so out of our direct expertise area, unless it will be found a widespread problem.

#38 Updated by Dru Lavigne over 2 years ago

  • File deleted (debug-freenas-20180108073233.txz)

#39 Updated by Dru Lavigne over 2 years ago

  • File deleted (debug.tar.gz)

#40 Updated by Dru Lavigne over 2 years ago

  • File deleted (tcpdump.txt)

#41 Updated by Dru Lavigne over 2 years ago

  • Category changed from 38 to 129
  • Target version changed from 11.2-BETA1 to N/A
  • Private changed from Yes to No

Also available in: Atom PDF