Project

General

Profile

Bug #71459

TCP Traffic Stops after Hundreds of DupACKs from FreeNAS

Added by Dakota Schneider 5 months ago. Updated 5 months ago.

Status:
Closed
Priority:
No priority
Assignee:
Ryan Moeller
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Duplicate Issue
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

After recently upgrading to 11.2-RELEASE I have discovered a serious bug in network connection. Same issue as described here: https://forums.freenas.org/index.php?threads/11-2-rc2-network-issue-over-wifi-tcp-traffic-stops-after-hundreds-of-dupacks-from-freenas.71553/

When connected to FreeNAS from my MacBook Pro 11,2 running Mojave, 10.14.2 (latest), I am unable to upload data to FreeNAS. SMB upload to the main FreeNAS server, iperf upload to the server, iperf upload to jails under IOCage and Warden, all are affected the same way.

The connection works fine for the first few seconds, but then the server starts flooding duplicate TCP ACK packets. This quickly turns into a ~500 packet burst across 10ms, which can totally saturate the laptop's network link and cause interruptions to other connections. I first noticed this when I experienced stuttering and gaps during AirPlay playback. Reconnecting WiFi solved the issue, by resetting the problem connection with FreeNAS.

I have not yet tested the proposed delayed_ack kernel flag, but I can confirm that this issue only affects my MacBook and FreeNAS (incl. jails). The issue is identical when connected over WiFi or Ethernet, and there are no problems connecting to other VMs running on the same server as FreeNAS (I run FreeNAS under ESXi 6.5).

This issue only affects packets heading from the MacBook to FreeNAS and only TCP packets. So I can download files via SMB without issue, but as soon as I go to upload it experiences this problem. Also I can run iperf with UDP and do not experience this issue. Services running on FreeNAS and in Warden and IOCage jails (Transmission and Plex, respectively) do not experience any throughput issues.

Another bizarre symptom is ping latency: if I ping the MacBook from FreeNAS (or any jail) at the usual 1Hz rate, I get high ping fluctuation, around 50ms +/- 50ms. However, if I increase the ping rate to 10Hz (with flag "-i 0.1") the ping settles out and is perfectly normal: <1ms over Ethernet and 1.12ms over WiFi. Ping out from the MacBook to FreeNAS is not affected.

I've attached a pcap showing the dup ACK flood. I can also provide sample ping output if desired.


Related issues

Is duplicate of FreeNAS - Bug #43558: Relax the TCP reassembly queue length limit to improve performanceDone

History

#1 Updated by Dakota Schneider 5 months ago

Update: I can confirm that turning off delayed ACK in FreeNAS solves this issue:

root@freenas:~ # sysctl net.inet.tcp.delayed_ack=0

I have also tried leaving delayed ACK on in FreeNAS, while choosing different options for delayed ACK on my MacBook. No combination of net.inet.tcp.delayed_ack settings had any noticeable effect, and the above issue was present for all of them. For reference on delayed ACK options in macOS see: http://www.shabangs.net/osx/speed-up-macos-x-file-transferring-over-network/

I tried all three, to no avail. Only disabling delayed ACK on the FreeNAS server has any effect.

#2 Updated by Dakota Schneider 5 months ago

Additionally: note for reference where delayed ack was enabled: https://redmine.ixsystems.com/issues/15920

#3 Updated by Dakota Schneider 5 months ago

Update: please consider this bug a duplicate of https://redmine.ixsystems.com/issues/43558

I have tested setting:

root@freenas:~ # sysctl net.inet.tcp.delayed_ack=1
net.inet.tcp.delayed_ack: 0 -> 1
root@freenas:~ # sysctl net.inet.tcp.reass.maxqueuelen=16384
net.inet.tcp.reass.maxqueuelen: 100 -> 16384

And can report that the bug is eliminated with this workaround as well as disabling delayed ack. I'll note that they are almost identical in network performance, although I appear to lose about 2–5MB/sec in upload performance over SMB when delayed ack is turned on and the maxqueue is increased. I don't have sufficiently granular CPU data to compare, but there doesn't appear to be a significant difference in the CPU performance of delayed ack in my hardware environment.

#4 Updated by Ryan Moeller 5 months ago

  • Status changed from Unscreened to Closed
  • Assignee changed from Release Council to Ryan Moeller
  • Target version changed from Backlog to N/A
  • Reason for Closing set to Duplicate Issue

Dakota, thank you for your feedback. I am closing this ticket as a duplicate, but I'd like to give some additional information as well.

The dup acks are not directly what is overwhelming your network, as an ack is very small. What happens is that in the event TCP segments are lost (could be loss due to unreliable network, or discarded segments due to the reassembly queue being full), the TCP protocol will ack the last in-order segment. When the next segment in the sequence is lost, the ack will be a duplicate because the sequence number will not have advanced. This then triggers the sender to retransmit all the segments after the acked sequence number. These are the TCP retransmissions you see in the packet capture you attached to your ticket.

Retransmits are very expensive because, in the pathological case you are experiencing, up to a full window might be sent again for each loss. Window size is variable, but the max receive window is tunable by recvbuf_max. This means the number of in-flight bytes shall not exceed 2097152 with your configuration. In your packet capture, the window size appears to be even smaller than that, only 57856 bytes. So each duplicate ack may be causing up to 57856 bytes to be sent again, and that is what floods your network.

My calculations for the max queue length were based on the default max window size of 2097152 and the default max segment size of 1460 bytes. No more than 1437 segments of 1460 bytes each can fit in a 2097152 byte window, so I figured that 1437 would be a more sensible default value for the max queue length. At some point I mistakenly counted TCP timestamps against the MSS as is a common practice and you are likely to see in wireshark, for a segment size of 1448 and giving a queue length of 1448, but that is not how FreeBSD seems to do the calculation, so 1460 byte segments and a queue length of 1437 is more correct. That is not a very significant difference anyway, and the overshoot should not be a problem.

Here is the relevant snippet of the actual code, to illustrate what happens:

if ((th->th_seq != tp->rcv_nxt || !TCPS_HAVEESTABLISHED(tp->t_state)) &&
        tp->t_segqlen >= min((so->so_rcv.sb_hiwat / tp->t_maxseg) + 1,
        tcp_reass_maxqueuelen)) {

I'll break it down so it makes somewhat more sense:

if the segment needs to be queued because it is not the next in sequence or it does not belong to an established session:
    if the queue has more segments than the lesser of (receive window size / max segment size) + 1 or the value of maxqueuelen:
        (discard the segment)

What this means is that setting maxqueuelen greater than 1437 does not make any difference when the max receive buffer size (recvbuf_max) is 2097152 and the max segment size is 1460 (MTU 1500), because the queue length is limited by the lesser of 1437 or maxqueuelen.

Now, there are other reasons you might be losing segments:
  • Memory exhaustion - segments will be discarded if the TCP stack is under serious memory pressure. I have not seen this, and I think it is pretty unlikely, but you could check on your RAM usage.
  • Unreliable network - wireless networks can be just unreliable and lose packets.
  • Window scaling - an unreliable network might cause the window size to be scaled down. When this happens, queue segments can be discarded and retransmit. This should not be a frequent occurrence during a single TCP session, because the window can only get so small, but a smaller window size might explain your decreased throughput.
  • There is an edge case where a window might be filled by segments that are smaller than the max segment size, in which case the queue might overflow, but I don't think I see that in your packet capture, and I'm not sure if that's a very likely scenario.

Check netstat -p tcp -s | fgrep 'discarded due' to see the number of segments discarded by the stack. This stat counts discards both due to lack of memory and due to full reassembly queue, despite the label.
If this number is increasing infrequently and only in small increments, see if it corresponds with a shrinking window size. This could indicate an unreliable network. If it seems like it is rising quite rapidly and you have confirmed there is no loss due to the network, and somehow raising the maxqueuelen value clearly helps, feel free to add another note to this ticket.
You can also use netstat -p tcp -s | fgrep order to observe the number of out-of-order TCP segments received.

#5 Updated by Ryan Moeller 5 months ago

  • Is duplicate of Bug #43558: Relax the TCP reassembly queue length limit to improve performance added

#6 Updated by Dru Lavigne 5 months ago

  • File deleted (MacBookPro-FreeNAS-dupACK-sm.pcapng)

Also available in: Atom PDF