Project

General

Profile

Bug #19529

zfs-snmp can not receive stats from freenas-snmpd.py: socket.error: [Errno 35] Resource temporarily unavailable

Added by Vasily Kolosov almost 4 years ago. Updated about 3 years ago.

Status:
Closed: Duplicate
Priority:
Important
Assignee:
William Grzybowski
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

Supermicro X8DT3-LN4F, 2 x Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
Supermicro SC836 chassis
16 x WDC RE4 WD4000FYYZ
96 GB RAM
IBM M5016-LSI 9266-8i + CacheVault

ChangeLog Required:
No

Description

The issue was encountered while investigating same problem as described in this forum topic: https://forums.freenas.org/index.php?threads/snmp-zfs-stats-incomplete.45901/

Lots of ZFS stats were missing when running snmpwalk / snmpget, including I/O stats per second (such as .1.3.6.1.4.1.25359.1.1.19.0 or FREENAS-MIB::zfsPoolOpRead1sec.1).

Running a copy of /usr/local/bin/freenas-snmp/zfs-snmp directly, while uncommenting exception handling operators, I could discover that exception happens when receiving data from freenas-snmpd.py daemon:

def get_from_freenas_snmpd_sock(val_to_obtain):
    data = ''
    try:
        s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        s.connect(FREENASSNMPDSOCK)
        s.setblocking(0)
        s.send(val_to_obtain)       #  <-----------------  Completes successfully
        packet = s.recv(4096)     #  <----------------- socket.error (Errno 35, Resource temporarily unavailable) is raised here
        while packet:
            data += packet
            packet = s.recv(4096)
    except socket.error:      #    <---------------- Errno 35, Resource temporarily unavailable, is caught here
        pass
    finally:
        s.close()
    try:
        data = json.loads(data)     #  <------- data is empty, so no stats are received further
    except ValueError:
        data = {}
    return data

The issue could be solved by changing:

s.setblocking(0)

To:

s.settimeout(1)                 # <-------- 1-second socket timeout

I could not yet drill down to origins of the issue, hoping that information above could help us find the root cause together.


Related issues

Has duplicate FreeNAS - Bug #16974: ZILSTAT OID not reporting statsResolved2016-08-232016-09-19

History

#1 Updated by Bonnie Follweiler almost 4 years ago

  • Assignee set to William Grzybowski

#2 Updated by William Grzybowski almost 4 years ago

  • Assignee changed from William Grzybowski to Suraj Ravichandran

This is suraj's work, he is a best fit.

#3 Updated by Vaibhav Chauhan over 3 years ago

  • Priority changed from No priority to Important
  • Target version set to 9.10.3

#4 Updated by Vasily Kolosov over 3 years ago

Hi. Seeing activity on this issue, would like to report additional detail.

Fix mentioned in the original post does add stats to snmpget which were not available before, but makes snmpget/snmpwalk really slow and unreliable. Looks like every request is going to wait for 1 second if you specify settimeout(1). And it will wait for 0.1 second in case you set it to 0.1. Sometimes the request would just hang.

I presume the problem lies deeper and needs additional root cause investigation. Will be happy to provide any additional detail / log files / etc!

#5 Avatar?id=14398&size=24x24 Updated by Kris Moore over 3 years ago

  • Target version changed from 9.10.3 to 9.10.4

#6 Updated by Suraj Ravichandran over 3 years ago

  • Status changed from Unscreened to Screened

@Vasily thanks for your detailed report and please accept my apologies for lack of traction on this.

I will not be able to get to this month, but will surely try to get to it next month

#7 Avatar?id=14398&size=24x24 Updated by Kris Moore over 3 years ago

  • Target version changed from 9.10.4 to 11.1

#8 Updated by Dru Lavigne about 3 years ago

  • Assignee changed from Suraj Ravichandran to William Grzybowski

William: please load balance between Vladimir and Nikola.

#9 Updated by William Grzybowski about 3 years ago

  • Status changed from Screened to Unscreened
  • Assignee changed from William Grzybowski to Vladimir Vinogradenko

Vladimir, is this something you can look at? Thanks!

#10 Updated by Vladimir Vinogradenko about 3 years ago

  • Status changed from Unscreened to Screened

#11 Updated by Vladimir Vinogradenko about 3 years ago

  • Status changed from Screened to Fix In Progress

#12 Updated by Vladimir Vinogradenko about 3 years ago

  • Status changed from Fix In Progress to 15

I see commits that fixed this 3 months before this ticket was created. Does this fix need to be backported to some oldstable branch?

#13 Updated by William Grzybowski about 3 years ago

Vladimir Pustosmekhov wrote:

I see commits that fixed this 3 months before this ticket was created. Does this fix need to be backported to some oldstable branch?

Nope, if you believe this has been fixed simply attach the commit hashes and close this ticket. Thanks!

#14 Updated by Vladimir Vinogradenko about 3 years ago

  • Status changed from 15 to Needs Developer Review
  • Assignee changed from Vladimir Vinogradenko to William Grzybowski

I doubt which exact commit fixes exactly this issue, but a lot of work has been done on zfs-snmp by now, so today it is completely different file that has been originally discussed: https://github.com/freenas/freenas/blob/master/src/freenas/usr/local/bin/freenas-snmp/zfs-snmp

It could be commits around these:
https://github.com/freenas/freenas/commit/154f9156a678fc1b0c29046b3afbbdb322d379a1
https://github.com/freenas/freenas/commit/977d14df48bd85ee3d84b5a3ba1c8d396c99c025

#15 Updated by William Grzybowski about 3 years ago

  • Status changed from Needs Developer Review to Closed: Duplicate
  • Target version changed from 11.1 to N/A

#16 Updated by William Grzybowski about 3 years ago

  • Has duplicate Bug #16974: ZILSTAT OID not reporting stats added

Also available in: Atom PDF