Project

General

Profile

Bug #27818

System unresponsive and slow - console filled with 'snmpd[PID]: Connection from UDP' errors

Added by Leon Roy over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
No priority
Assignee:
Vladimir Vinogradenko
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Cannot Reproduce
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

Supermicro SC836 chassis, Supermicro X9SCL+-F Motherboard, 32GB ECC UDIMM, 32GB Supermicro SSD BOM, LSI 9211-8i HBA, 16x 3TB SATA drives, Intel SSD SLOG

ChangeLog Required:
No

Description

Since upgrading to FreeNAS-11.1-RELEASE one of our servers is dying after a few hours with hundreds of messages in the console like:

```
Jan 16 16:00:17 snmpd2619: Connection from UDP: [10.0.0.2]:55040->[10.0.0.3]:161
Jan 16 16:00:21 snmpd2619: Connection from UDP: [10.0.0.2]:59960->[10.0.0.3]:161
Jan 16 16:00:21 snmpd2619: Connection from UDP: [10.0.0.2]:58587->[10.0.0.3]:161
Jan 16 16:00:25 snmpd2619: Connection from UDP: [10.0.0.2]:59960->[10.0.0.3]:161
```

We can go to a shell via the console (which takes a long time to load) but shutting down hangs when the system attempts to stop the snmpd process with:

```
Jan 16 16:01:17 snmpd2619: Received TERM or STOP signal... shutting down...
Jan 16 16:01:37 init: some processes would not die; ps axl advised
```

History

#1 Updated by Dru Lavigne over 2 years ago

  • Status changed from Unscreened to 15
  • Private changed from No to Yes

Leon: please attach a debug (System -> Advanced -> Save Debug).

#2 Updated by Leon Roy over 2 years ago

Dru Lavigne wrote:

Leon: please attach a debug (System -> Advanced -> Save Debug).

Tried that but 20 minutes later and it's stuck with the Save Debug progress bar.

#3 Updated by Dru Lavigne over 2 years ago

  • Status changed from 15 to Unscreened
  • Assignee changed from Release Council to Vladimir Vinogradenko
  • Target version set to 11.1-U2

Vlad: is there enough here to start investigating?

#4 Updated by Vladimir Vinogradenko over 2 years ago

  • Status changed from Unscreened to 15

Leon Roy, please do the following:

bash -c 'kill -9 $(ps ax | grep "daemon: /usr/local/bin/snmp-agent.py" | grep -v grep | awk "{print \$1}")'
bash -c 'kill -9 $(ps ax | grep "/usr/local/bin/snmp-agent.py" | grep -v grep | awk "{print \$1}")'

I hope system will stop being unresponsive then.

Then run

/usr/local/bin/snmp-agent.py

It will likely crash with an error. Please post it here.

Thank you.

#5 Updated by Leon Roy over 2 years ago

Vladimir Vinogradenko wrote:

Leon Roy, please do the following:

...

It will likely crash with an error. Please post it here.

Vladimir do I do that when the box crashes or while it's functioning normally?

#6 Updated by Vladimir Vinogradenko over 2 years ago

while it's functioning normally?

Do you mean that sometimes (when snmp-agent is running) it is functioning normally and sometimes it is unresponsive with logs flooded with messages you've specified above?

In any case, you need to do commands above when system is unresponsive so we can understand what's wrong with snmp-agent, why it is so resource-consuming

#7 Updated by Leon Roy over 2 years ago

Vladimir Vinogradenko wrote:

while it's functioning normally?

Do you mean that sometimes (when snmp-agent is running) it is functioning normally and sometimes it is unresponsive with logs flooded with messages you've specified above?

In any case, you need to do commands above when system is unresponsive so we can understand what's wrong with snmp-agent, why it is so resource-consuming

Tried it briefly, it failed on one box with `kill cannot accept ' as an argument`or something similar and on the other box which has no issues all commands completed without any output except 'snmp agent starting'.

A few other issues have been reported by my team following our upgrade to 11.1. NFS is failing on one box (timeouts when connecting from all clients). On top of that the hostname was rewritten back to 'freenas'. We've had to revert to 11.0-U3 so no chance I can help diagnose this issue further.

#8 Updated by Dru Lavigne over 2 years ago

  • Status changed from 15 to Investigation

#9 Updated by Dru Lavigne over 2 years ago

  • Status changed from Investigation to Blocked
  • Reason for Blocked set to Need additional information

#10 Avatar?id=13649&size=24x24 Updated by Ben Gadd over 2 years ago

  • Due date set to 02/12/2018

Due date updated to reflect the code freeze for 11.1U2.

#11 Avatar?id=13649&size=24x24 Updated by Ben Gadd over 2 years ago

  • Severity set to New

#12 Updated by Leon Roy over 2 years ago

After updating to FreeNAS-11.1-U1 the issue no longer occurs. Will report back if the issue appears again.

#13 Updated by Dru Lavigne over 2 years ago

  • Status changed from Blocked to Closed
  • Target version changed from 11.1-U2 to N/A
  • Private changed from Yes to No

Thanks for the update Leon. I'll close the ticket for now but please add a comment to it if the issue reappears.

#14 Updated by Dru Lavigne over 2 years ago

  • Reason for Closing set to Cannot Reproduce
  • Reason for Blocked deleted (Need additional information)

Also available in: Atom PDF