Project

General

Profile

Bug #68736

System freeze when iSCSI over 10Gbe is trying to be mounted

Added by Lorenzo Rosei 6 months ago. Updated 6 months ago.

Status:
Closed
Priority:
No priority
Assignee:
Alexander Motin
Category:
OS
Target version:
Severity:
New
Reason for Closing:
Cannot Reproduce
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Hello,

I recently added a T320 10Gbe NIC to my FreeNAS (well, recently started using it again, tbh) and used a DAC to connect to my workstation that has a ConnectX-3.

The two computers see one another, speeds pre tuning are fine, around 3Gbit/s, but as soon as I moved iSCSI over to the 10Gbe interface, the NAS hangs when the windows machine tries to mount it.

What I did was, from the configured target, set that ONLY the 10Gbe IP was the one it had to bind to.

Thanks for any help

IMG_20190110_124359.jpg (3.23 MB) IMG_20190110_124359.jpg Lorenzo Rosei, 01/10/2019 03:56 AM
48243

History

#1 Updated by Lorenzo Rosei 6 months ago

  • File debug-susan-20190109110803.txz added
  • Private changed from No to Yes

#2 Updated by Lorenzo Rosei 6 months ago

  • Subject changed from System freeze when iSCSI over 10Gbe is trying to connect to System freeze when iSCSI over 10Gbe is trying to be mounted

#3 Updated by Dru Lavigne 6 months ago

  • Category changed from Services to OS
  • Assignee changed from Release Council to Alexander Motin

#4 Updated by Alexander Motin 6 months ago

  • Status changed from Unscreened to Blocked
  • Reason for Blocked set to Need additional information from Author

I'm sorry Lorenzo, but we'd need something more then "System freeze". Hangs are less typical for software errors then for hardware issues. In case it is still a software, I'd recommend to try enabling debug kernel in FreeNAS settings and reboot, hoping that system crash on assertion instead of hanging. Also you may try to enable watchdog timer with `service watchdogd onestart` to see whether it be able to detect system hang and get to debugger. Or, if hardware allows, you may try to trigger NMI interrupt to OS with some respective hardware button after it hang, that also may drop into debugger and automatically save core dump with some information.

In other words give us something, even console screenshot with some messages could be useful.

#5 Updated by Lorenzo Rosei 6 months ago

Alexander,

thanks for your reply. Sadly the system becomes completely frozen and there are no lines on console. I will surely enable debug kernel and replicate the crash, hoping to be able to give you more information.

I had hoped that in debug dump there would be something useful, but I understand there's nothing.

Thanks again,

Lorenzo

#6 Updated by Lorenzo Rosei 6 months ago

48243

Hi,

I've enabled debug kernel and took a picture of the only thing that happened when I connected the DAC cable (picture attached).

After that, it hang as usual, and it was replying to ping on the 1GB interface. The 10GB interface was not replying to pings.

I have reboot and created debug again (which I attach) and ran service watchdogd onestart. This time, I had no text on console, and the machine did not get to debugger. My motherboard does not allow the function you mentioned sadly, as it's a consumer grade motherboard.

After a short while since it hangs, all interfaces stop replying to ping.

#7 Updated by Lorenzo Rosei 6 months ago

I wanted to add that I did some more testing, and it looks the system crashes even without iSCSI. Just using SMB through that interface makes it hang with no debug info.

Really sorry I can't give more data.

#8 Updated by Alexander Motin 6 months ago

Lorenzo Rosei wrote:

I've enabled debug kernel and took a picture of the only thing that happened when I connected the DAC cable (picture attached).

LORs are logged exactly because they may cause deadlocks, but sometimes it may happen so rare that it counted as acceptable. The reported one is related to file systems rather then networking.

After that, it hang as usual, and it was replying to ping on the 1GB interface. The 10GB interface was not replying to pings.

pings on 1Gb interface mean that kernel is not completely dead, that is interesting. Does the system react on anything else? We need to crash it somehow or some other way see what is going on there.

#9 Updated by Lorenzo Rosei 6 months ago

I've done more tries to make it crash in a useful way, but I've had no luck. I'm feeling a bit discouraged, I feel I may have SOMETHING faulty in my hardware, but can't understand what it is.

#10 Updated by Alexander Motin 6 months ago

I'd probably try to swap T320 and ConnectX-3 between systems. IIRC I had cases when old Chelsio NICs were not working well in some new boards and back. While we generally recommend Chelsio, FreeNAS should support ConnectX-3 too.

#11 Updated by Alexander Motin 6 months ago

  • Status changed from Blocked to Closed
  • Target version changed from Backlog to N/A
  • Reason for Closing set to Cannot Reproduce
  • Reason for Blocked deleted (Need additional information from Author)

I am closing this until we have at least anything to start.

#12 Updated by Dru Lavigne 6 months ago

  • File deleted (debug-susan-20190109110803.txz)

#13 Updated by Dru Lavigne 6 months ago

  • File deleted (debug-susan-20190110125048.tgz)

#14 Updated by Dru Lavigne 6 months ago

  • Private changed from Yes to No

Also available in: Atom PDF