Project

General

Profile

Bug #40788

Use CallError on rrdtool failure

Added by Scott Finlon over 1 year ago. Updated over 1 year ago.

Status:
Done
Priority:
No priority
Assignee:
William Grzybowski
Category:
Middleware
Target version:
Seen in:
Severity:
Medium
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

I have a few VLAN interfaces on my FreeNAS, and I think that's causing the rrdtool to go a bit haywire. The graphs just spin, and the logs are constant errors.

[2018/08/05 02:55:40] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:40] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:40] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:40] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:40] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ["ValueError: rrdtool failed: ERROR: opening '/var/db/collectd/rrd/localhost//interface-interface-tap5/if_octets.rrd': No such file or directory\n", '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:40] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ['ValueError: rrdtool failed: ERROR: bad CF: rx\n', '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:40] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ["ValueError: rrdtool failed: ERROR: opening '/var/db/collectd/rrd/localhost//interface-interface-bridge1/if_octets.rrd': No such file or directory\n", '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:40] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:40] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ["ValueError: rrdtool failed: ERROR: opening '/var/db/collectd/rrd/localhost//interface-interface-tap1/if_octets.rrd': No such file or directory\n", '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:40] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:40] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ['ValueError: rrdtool failed: ERROR: bad CF: rx\n', '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:40] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ['ValueError: rrdtool failed: ERROR: bad CF: rx\n', '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:40] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:40] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ['ValueError: rrdtool failed: ERROR: bad CF: rx\n', '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:40] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:40] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:40] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ['ValueError: rrdtool failed: ERROR: bad CF: rx\n', '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:40] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:41] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ["ValueError: rrdtool failed: ERROR: opening '/var/db/collectd/rrd/localhost//interface-interface-vnet1/if_octets.rrd': No such file or directory\n", '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:41] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:41] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ['ValueError: rrdtool failed: ERROR: bad CF: rx\n', '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:41] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ["ValueError: rrdtool failed: ERROR: opening '/var/db/collectd/rrd/localhost//interface-interface-vlan250/if_octets.rrd': No such file or directory\n", '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:41] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ['ValueError: rrdtool failed: ERROR: bad CF: rx\n', '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:41] (DEBUG) middlewared.logger.CrashReporting.report():97 - Sending a crash report...
[2018/08/05 02:55:41] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ['ValueError: rrdtool failed: ERROR: bad CF: rx\n', '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']
[2018/08/05 02:55:41] (ERROR) sentry.errors.uncaught._log_failed_submission():702 - ["ValueError: rrdtool failed: ERROR: opening '/var/db/collectd/rrd/localhost//interface-interface-bridge2/if_octets.rrd': No such file or directory\n", '  File "middlewared/main.py", line 154, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 1013, in call_method', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/main.py", line 953, in _call', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "middlewared/schema.py", line 658, in nf', '  File "asyncio/coroutines.py", line 110, in __next__', '  File "plugins/stats.py", line 113, in get_data']

Related issues

Copied to FreeNAS - Bug #40860: rrdtool failed: ERROR: bad CFClosed
Copied to FreeNAS - Bug #41365: Avoid sending dozens of crash reports filling up thread poolDone

Associated revisions

Revision 4e45c615 (diff)
Added by William Grzybowski over 1 year ago

fix(middlewared/stats): use CallError on rrdtool failure Ticket: #40788

Revision 68f0e577 (diff)
Added by William Grzybowski over 1 year ago

fix(middlewared/stats): use CallError on rrdtool failure Ticket: #40788

History

#1 Updated by Dru Lavigne over 1 year ago

  • Assignee changed from Release Council to William Grzybowski
  • Private changed from No to Yes

#2 Updated by Scott Finlon over 1 year ago

Not sure if I should file a separate bug report, but because rrdtool and middleware are going nuts and grabbing up all available memory while they are at it, I'm getting VM errors that they don't have sufficient memory on the box, but it appears to be related to this.

Error: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 154, in call_method
    result = await self.middleware.call_method(self, message)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 110, in __next__
    return self.gen.send(None)
  File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 1013, in call_method
    return await self._call(message['method'], serviceobj, methodobj, params, app=app, io_thread=False)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 110, in __next__
    return self.gen.send(None)
  File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 953, in _call
    return await methodobj(*args)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 110, in __next__
    return self.gen.send(None)
  File "/usr/local/lib/python3.6/site-packages/middlewared/schema.py", line 658, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 110, in __next__
    return self.gen.send(None)
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/vm.py", line 1019, in start
    await self.__init_guest_vmemory(id)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 110, in __next__
    return self.gen.send(None)
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/vm.py", line 842, in __init_guest_vmemory
    raise CallError(f'Cannot guarantee memory for guest id: {id}')
middlewared.service_exception.CallError: [EFAULT] Cannot guarantee memory for guest id: 19

#3 Updated by William Grzybowski over 1 year ago

  • Copied to Bug #40860: rrdtool failed: ERROR: bad CF added

#4 Updated by William Grzybowski over 1 year ago

  • Status changed from Unscreened to Not Started
  • Target version changed from Backlog to 11.2-BETA3
  • Severity changed from New to Medium

Scott Finlon wrote:

Not sure if I should file a separate bug report, but because rrdtool and middleware are going nuts and grabbing up all available memory while they are at it, I'm getting VM errors that they don't have sufficient memory on the box, but it appears to be related to this.

[...]

How can you be sure middlewared and rrdtool are grabbing all the memory?

#5 Updated by Scott Finlon over 1 year ago

William Grzybowski wrote:

Scott Finlon wrote:

Not sure if I should file a separate bug report, but because rrdtool and middleware are going nuts and grabbing up all available memory while they are at it, I'm getting VM errors that they don't have sufficient memory on the box, but it appears to be related to this.

[...]

How can you be sure middlewared and rrdtool are grabbing all the memory?

I am not 100% sure, but it appears to be.

However:
- All signs point at them being the culprit, because this memory issue only started in 11.2-BETA2, and only after the other errors started.
- I am currently running a single VM and a single jail, where under 11.1 I had something stupid like 5 VMs and 7 jails all going without a hitch.
- The traceback points to a middleware process, which is already throwing massive amounts of errors `"/usr/local/lib/python3.6/site-packages/middlewared/main.py"`
- monitoring usage via ps aux and htop show an inordinate amount processes being spun up by middleware and rrdtool

Again, I'm not sure, that's why I added it here, if you would rather I can easily create a new bug ticket, but that's why I asked.

#6 Updated by William Grzybowski over 1 year ago

Scott Finlon wrote:

William Grzybowski wrote:

Scott Finlon wrote:

Not sure if I should file a separate bug report, but because rrdtool and middleware are going nuts and grabbing up all available memory while they are at it, I'm getting VM errors that they don't have sufficient memory on the box, but it appears to be related to this.

[...]

How can you be sure middlewared and rrdtool are grabbing all the memory?

I am not 100% sure, but it appears to be.

However:
- All signs point at them being the culprit, because this memory issue only started in 11.2-BETA2, and only after the other errors started.
- I am currently running a single VM and a single jail, where under 11.1 I had something stupid like 5 VMs and 7 jails all going without a hitch.
- The traceback points to a middleware process, which is already throwing massive amounts of errors `"/usr/local/lib/python3.6/site-packages/middlewared/main.py"`
- monitoring usage via ps aux and htop show an inordinate amount processes being spun up by middleware and rrdtool

Again, I'm not sure, that's why I added it here, if you would rather I can easily create a new bug ticket, but that's why I asked.

This seems to be a different issue. you have 18GB of RAM wired to the kernel, as ZFS ARC. Thats why its not letting you to run more VMs.

You can probably workaround it by rebooting and starting the VMs right after boot.

#7 Updated by Scott Finlon over 1 year ago

zfs arc max is limited to 18 gig, I have 32 in the box, so that's still 14 gig remaining, with the lone VM using 4 gig, that still should be 10 gig left. I did reboot, and the VM's started but now rancher is totally broken and doesn't want to start so I have other things to figure out at the moment.

#8 Updated by Bug Clerk over 1 year ago

  • Status changed from Not Started to In Progress

#9 Updated by Dru Lavigne over 1 year ago

  • File deleted (debug (1).tgz)

#10 Updated by Dru Lavigne over 1 year ago

  • Status changed from In Progress to Ready for Testing
  • Private changed from Yes to No
  • Needs Merging changed from Yes to No

#11 Updated by Dru Lavigne over 1 year ago

  • Subject changed from rrdtool failed: ERROR: bad CF to Use CallError on rrdtool failure
  • Needs Doc changed from Yes to No

#12 Updated by William Grzybowski over 1 year ago

  • Copied to Bug #41365: Avoid sending dozens of crash reports filling up thread pool added

#14 Updated by Bonnie Follweiler over 1 year ago

  • Status changed from Ready for Testing to Passed Testing
  • Needs QA changed from Yes to No

Test Passed in FreeNAS-11.2-MASTER-201809050856

#15 Updated by Dru Lavigne over 1 year ago

  • Status changed from Passed Testing to Done

Also available in: Atom PDF