Project

General

Profile

Bug #7891

ctld continually exits - read connection lost

Added by Derek Hunt over 5 years ago. Updated over 5 years ago.

Status:
Closed: Third party to resolve
Priority:
Nice to have
Assignee:
Alexander Motin
Category:
OS
Target version:
-
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

The volume is still mounted properly, however, this message is spawning every 10 seconds of so.

Log messages:

Feb 8 14:40:35 nas ctld7746: child process 8119 terminated with exit status 1
Feb 8 14:40:45 nas ctld8124: 10.0.0.1: read: connection lost
Feb 8 14:40:45 nas ctld7746: child process 8124 terminated with exit status 1
Feb 8 14:40:55 nas ctld8128: 10.0.0.1: read: connection lost
Feb 8 14:40:55 nas ctld7746: child process 8128 terminated with exit status 1
Feb 8 14:41:05 nas ctld8133: 10.0.0.1: read: connection lost
Feb 8 14:41:05 nas ctld7746: child process 8133 terminated with exit status 1
Feb 8 14:41:15 nas ctld8136: 10.0.0.1: read: connection lost
Feb 8 14:41:15 nas ctld7746: child process 8136 terminated with exit status 1
Feb 8 14:41:25 nas ctld8143: 10.0.0.1: read: connection lost
Feb 8 14:41:25 nas ctld7746: child process 8143 terminated with exit status 1
Feb 8 14:41:35 nas ctld8147: 10.0.0.1: read: connection lost
Feb 8 14:41:35 nas ctld7746: child process 8147 terminated with exit status 1
Feb 8 14:41:45 nas ctld8152: 10.0.0.1: read: connection lost
Feb 8 14:41:45 nas ctld7746: child process 8152 terminated with exit status 1
Feb 8 14:41:55 nas ctld8157: 10.0.0.1: read: connection lost
Feb 8 14:41:55 nas ctld7746: child process 8157 terminated with exit status 1
Feb 8 14:42:05 nas ctld8160: 10.0.0.1: read: connection lost
Feb 8 14:42:05 nas ctld7746: child process 8160 terminated with exit status 1
Feb 8 14:42:15 nas ctld8164: 10.0.0.1: read: connection lost
Feb 8 14:42:15 nas ctld7746: child process 8164 terminated with exit status 1

This happens every 10 seconds.

Interface info:
ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO>
ether a0:36:9f:4e:b2:14
inet 10.0.0.2 netmask 0xffffff00 broadcast 10.0.0.255
inet 10.2.2.2 netmask 0xffffff00 broadcast 10.2.2.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active

Contents of ctld.conf

portal-group pg1 {
discovery-filter portal-name
discovery-auth-group no-authentication
listen 0.0.0.0:3260
}

auth-group ag4tg_1 {
}

target iqn.2007-09.jp.ne.peach.istgt:target1 {
alias target1
auth-group no-authentication
portal-group pg1

lun 0 {
option unmap on
path /dev/zvol/pool/iSCSI-VMS
blocksize 512
serial 002590577d8f000
device-id "iSCSI Disk 002590577d8f000 "
option vendor "FreeBSD"
option product "iSCSI Disk"
option revision "0123"
option naa 0x6589cfc0000003ec867169187986c43f
option insecure_tpc on
}
}

Related issues

Related to FreeNAS - Bug #17793: read: connection lostClosed: Cannot reproduce2016-09-27

History

#1 Updated by Jordan Hubbard over 5 years ago

  • Category set to 89
  • Assignee set to Alexander Motin

#2 Updated by Alexander Motin over 5 years ago

  • Status changed from Unscreened to 15

Derek, this information does not tell me anything about what is going on. Could you provide more?

Is 10.0.0.1 the address of your initiator? What iSCSI initiator software is running there?

Could you make a dump of packets between FreeNAS and that host including several of those disconnects with "tcpdump -pi xl0 -s 0 -w packets.dump port 3260 and host 10.0.0.1"?

#3 Updated by Derek Hunt over 5 years ago

  • File packets.dump added

Hi Alexander,

10.0.0.1 is the address of the initiator. It's a Proxmox 3.3 host (Linux pm01 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux). There is a single crossover cable running over 10gbe. Both machines are running Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) NICs.
The initiator software version is:

ii  open-iscsi                       2.0.873-3                     amd64

I'm attaching the packets.dump file from tcpdump. For what it's worth, I rolled back to the last stable release to test, and the same issue come up (I did not try any of the previous releases). The VMs running on the host seem mostly fine in as much as I haven't noticed any major freezes. Are there any other commands you would like me to run?

#4 Updated by Alexander Motin over 5 years ago

What I see in packet dump is that while one iSCSI connection is up and running normally, your initiator establishes new TCP connection to iSCSI port each 10 seconds, and terminates them just after that. If there is answer to your question, it should be on the initiator side. Don't you have some monitoring software there that could test server availability in such way?

#5 Updated by Alexander Motin over 5 years ago

  • Status changed from 15 to Closed: Third party to resolve

I don't see how FreeNAS could trigger the described behavior. The question is to the initiator.

#6 Updated by Linda Kateley over 5 years ago

I am seeing this exact same symptoms when using global san initiator and esxi 5.5 initiator.

#7 Updated by Derek Hunt over 5 years ago

I posted to the Proxmox forums and I haven't heard a response. Is the ESXi initiator different from the proxmox/debian package?

#8 Updated by Windsor Wallaby over 5 years ago

I am also seeing the exact same symptoms using Proxmox pve manager 2.2-32

#9 Updated by Derek Hunt over 5 years ago

  • Seen in changed from to 9.3-STABLE-201505130355

I noticed something digging around in the proxmox file:
/usr/share/perl5/PVE/Storage/LunCmd/Istgt.pm

There is a comment in the file that shows this:
#Current SIGHUP reload limitations (http://www.peach.ne.jp/archives/istgt/): #
  1. The parameters other than PG, IG, and LU are not reloaded by SIGHUP.
  2. LU connected by the initiator can't be reloaded by SIGHUP.
  3. PG and IG mapped to LU can't be deleted by SIGHUP.
  4. If you delete an active LU, all connections of the LU are closed by SIGHUP.
  5. Updating IG is not affected until the next login. #
  6. FreeBSD
  7. 1. Alt-F2 to change to native shell (zfsguru)
  8. 2. pw mod user root -w yes (change password for root to root)
  9. 3. vi /etc/ssh/sshd_config
  10. 4. uncomment PermitRootLogin yes
  11. 5. change PasswordAuthentication no to PasswordAuthentication yes
  12. 5. /etc/rc.d/sshd restart
  13. 6. On one of the proxmox nodes login as root and run: ssh-copy-id ip_freebsd_host
  14. 7. vi /etc/ssh/sshd_config
  15. 8. comment PermitRootLogin yes
  16. 9. change PasswordAuthentication yes to PasswordAuthentication no
  17. 10. /etc/rc.d/sshd restart
  18. 11. Reset passwd -> pw mod user root -w no
  19. 12. Alt-Ctrl-F1 to return to zfsguru shell (zfsguru)

This looks like a limitation of istgt:
The parameters other than PG, IG, and LU are not reloaded by SIGHUP.

Am I reading that correct? Is this out of the spec?

#10 Updated by Josh Paetzel over 5 years ago

istgt was the iSCSI target in FreeNAS 9.2.1.x and older 9.3 has a new iSCSI target that reloads properly under any circumstances.

#11 Updated by Suraj Ravichandran about 4 years ago

  • Related to Bug #17793: read: connection lost added

#12 Updated by Dru Lavigne over 2 years ago

  • File deleted (packets.dump)

Also available in: Atom PDF