Project

General

Profile

Bug #84771

Fix bug that made UI unresponsive when destroying lots of snapshots

Added by . Hokan over 2 years ago. Updated over 2 years ago.

Status:
Ready for Testing
Priority:
No priority
Assignee:
William Grzybowski
Category:
Middleware
Target version:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Using the new UI to delete several hundred snapshots causes the UI to become unresponsive. After some hours, the UI becomes responsive again.

Messages like this appear in the log:
Apr 2 19:06:15 enet-rascal-1 kernel: sonewconn: pcb 0xfffff80b3e782a50: Listen queue overflow: 193 already in queue awaiting acceptance (45 occurrences)

This give two clues: the pcb number and the overflow number.

netstat -Lan | egrep "(Proto|193)"
Proto Listen Local Address
unix 193/0/128 /var/run/middlewared.sock

and

netstat -a | egrep "(^Address|fffff80b3e782a50)"
Address Type Recv-Q Send-Q Inode Conn Refs Nextref Addr
fffff80b3e782a50 stream 0 0 fffff80919987938 0 0 0 /var/run/middlewared.sock

So something associated with middlewared.sock is having trouble keeping up?

Associated revisions

Revision b468572e (diff)
Added by William Grzybowski over 2 years ago

fix(middlewared/zfs): do not sue coroutine for libzfs routines Ticket: #84771

History

#1 Updated by . Hokan over 2 years ago

  • File debug-enet-rascal-1-20190403093543.txz added
  • Private changed from No to Yes

#2 Updated by Dru Lavigne over 2 years ago

  • Category changed from GUI (new) to OS
  • Assignee changed from Release Council to Alexander Motin

#3 Updated by Alexander Motin over 2 years ago

Not related to the snapshot deletion, which require more investigation, I see that at least in time of debug creation your system's CPU was pretty busy by some `dnetc` processes. I'd recommend to check that it is not related.

#4 Updated by Alexander Motin over 2 years ago

  • Category changed from OS to Middleware
  • Assignee changed from Alexander Motin to William Grzybowski

Hokan, do you have any information about system other then UI not responding while it happen? It would help to know whether it was all the system blocked, or only the mentioned middlewared process.

#5 Updated by William Grzybowski over 2 years ago

  • Target version changed from Backlog to 11.2-U5

#6 Updated by William Grzybowski over 2 years ago

  • Status changed from Unscreened to Screened

#7 Updated by . Hokan over 2 years ago

Alexander Motin wrote:

Not related to the snapshot deletion, which require more investigation, I see that at least in time of debug creation your system's CPU was pretty busy by some `dnetc` processes. I'd recommend to check that it is not related.

I'm sorry about having the dnetc jobs running when I submitted the ticket -- an oversight on my part. I turned it off yesterday while I was having trouble. The Distributed.net program uses lots of CPU and does almost no I/O, and there are fewer dnetc processes than CPU cores, so I think it wouldn't matter in any case.

Alexander Motin wrote:

Hokan, do you have any information about system other then UI not responding while it happen? It would help to know whether it was all the system blocked, or only the mentioned middlewared process.

The system is used primarily to serve NFS and that continued to work. I didn't evaluate if there was a performance hit during this time, but as a casual user of an NFS share I didn't see a problem. Also, interactive SSH sessions worked fine.

#8 Updated by William Grzybowski over 2 years ago

  • Target version changed from 11.2-U5 to 11.2-U4

#9 Updated by Bug Clerk over 2 years ago

  • Status changed from Screened to In Progress

#10 Updated by Bug Clerk over 2 years ago

  • Status changed from In Progress to Ready for Testing

#11 Updated by Dru Lavigne over 2 years ago

  • File deleted (debug-enet-rascal-1-20190403093543.txz)

#12 Updated by Dru Lavigne over 2 years ago

  • Subject changed from UI Unresponsive when destroying lots of snapshots: sonewconn: pcb 0xfffff80b3e782a50: Listen queue overflow: 193 already in queue awaiting acceptance (45 occurrences) to Fix bug that made UI unresponsive when destroying lots of snapshots
  • Private changed from Yes to No
  • Needs Doc changed from Yes to No
  • Needs Merging changed from Yes to No

Also available in: Atom PDF