Project

General

Profile

Bug #61236

Do not crash rrd_toggle when switching to tmpfs

Added by Ryan McKenzie almost 3 years ago. Updated over 2 years ago.

Status:
Done
Priority:
No priority
Assignee:
Vladimir Vinogradenko
Category:
Middleware
Target version:
Seen in:
TrueNAS - TrueNAS 11.2
Severity:
Medium
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Running on the 12-03-2018 nightly build of TrueNAS 11.2 (stable, internal)

Two time now on this build, upon making changes to a pool I begin getting collectd errors and cannot see reporting graphs. First time I saw it was upon adding a SLOG to my data pool. Second time was upon adding an L2ARC device to the same pool. Problem resolves after reboot and verifying the "System->System Data->Reporting Database" option is checked

Dec 3 10:38:24 tn03-a uwsgi: [reporting.rrd:163] Failed to generate graph: b"ERROR: opening '/var/db/collectd/rrd/localhost/aggregation-cpu-sum/cpu-idle.rrd': No such file or directory\n"
Dec 3 10:38:26 tn03-a uwsgi: [reporting.rrd:163] Failed to generate graph: b"ERROR: opening '/var/db/collectd/rrd/localhost/zfs_arc/cache_ratio-arc.rrd': No such file or directory\n"

SYstem is currently running a performance benchmark so I didn't take a debug yet.

Screenshot_20181203_1.png (70.4 KB) Screenshot_20181203_1.png Ryan McKenzie, 12/03/2018 11:25 AM
43164

Related issues

Copied to FreeNAS - Bug #79122: Do not crash rrd_toggle when switching to tmpfsClosed

History

#1 Updated by Dru Lavigne almost 3 years ago

  • Category changed from OS to Hardware
  • Status changed from Unscreened to Blocked
  • Assignee changed from Release Council to William Grzybowski
  • Private changed from No to Yes
  • Reason for Blocked set to Need additional information from Author

Ryan: once you are able, attach a debug to this ticket or provide William with access to the system.

#3 Updated by William Grzybowski almost 3 years ago

  • Status changed from Blocked to Screened
  • Reason for Blocked deleted (Need additional information from Author)

#4 Updated by William Grzybowski almost 3 years ago

  • Category changed from Hardware to Middleware
  • Target version changed from Backlog to TrueNAS 11.2
  • Severity changed from New to Medium

#5 Updated by Ryan McKenzie almost 3 years ago

More information on the issue. Removal of L2ARC and/or ZIL does not trigger the collectd problem.

Adding the ZIL back in gives the following messages on the console and the reporting graphs stop showing data. The graphs still render but are blank.

Dec  7 14:57:18 tn03-a uwsgi: [sentry.errors:674] Sentry responded with an API error: RateLimited(None)
Dec  7 14:57:18 tn03-a uwsgi: [sentry.errors.uncaught:702] ['OperationalError: database is locked', '  File "django/core/handlers/exception.py", line 42, in inner', '  File "django/core/handlers/base.py", line 249, in _legacy_get_response', '  File "django/core/handlers/base.py", line 178, in _get_response', '  File "freenasUI/freeadmin/middleware.py", line 163, in process_view', '  File "django/contrib/auth/decorators.py", line 22, in _wrapped_view', '  File "django/contrib/auth/decorators.py", line 46, in <lambda>', '  File "django/utils/functional.py", line 234, in inner', '  File "django/utils/functional.py", line 380, in _setup', '  File "django/contrib/auth/middleware.py", line 24, in <lambda>', '  File "django/contrib/auth/middleware.py", line 12, in get_user', '  File "django/contrib/auth/__init__.py", line 187, in get_user', '  File "django/contrib/auth/backends.py", line 102, in get_user', '  File "django/db/models/manager.py", line 85, in manager_method',
Dec  7 14:57:18 tn03-a uwsgi:  '  File "django/db/models/query.py", line 379, in get', '  File "django/db/models/query.py", line 238, in __len__', '  File "django/db/models/query.py", line 1087, in _fetch_all', '  File "django/db/models/query.py", line 54, in __iter__', '  File "django/db/models/sql/compiler.py", line 835, in execute_sql', '  File "django/db/backends/utils.py", line 64, in execute', '  File "django/db/utils.py", line 94, in __exit__', '  File "django/utils/six.py", line 685, in reraise', '  File "django/db/backends/utils.py", line 64, in execute', '  File "freenasUI/freeadmin/sqlite3_ha/base.py", line 412, in execute', '  File "freenasUI/freeadmin/sqlite3_ha/base.py", line 403, in locked_retry', '  File "freenasUI/freeadmin/sqlite3_ha/base.py", line 389, in locked_retry']
Dec  7 14:57:18 tn03-a uwsgi: [sentry.errors:674] Sentry responded with an API error: RateLimited(None)
Dec  7 14:57:18 tn03-a uwsgi: [sentry.errors.uncaught:702] ['OperationalError: database is locked', '  File "django/core/handlers/exception.py", line 42, in inner', '  File "django/core/handlers/base.py", line 249, in _legacy_get_response', '  File "django/core/handlers/base.py", line 187, in _get_response', '  File "django/core/handlers/base.py", line 185, in _get_response', '  File "freenasUI/failover/views.py", line 74, in failover_disabled', '  File "freenasUI/failover/notifier.py", line 360, in failover_disabled_reasons', '  File "django/db/models/query.py", line 660, in exists', '  File "django/db/models/sql/query.py", line 494, in has_results', '  File "django/db/models/sql/compiler.py", line 806, in has_results', '  File "django/db/models/sql/compiler.py", line 835, in execute_sql', '  File "django/db/backends/utils.py", line 64, in execute', '  File "django/db/utils.py", line 94, in __exit__', '  File "django/utils/six.py", line 685, in reraise', '  File "d
Dec  7 14:57:18 tn03-a uwsgi: jango/db/backends/utils.py", line 64, in execute', '  File "freenasUI/freeadmin/sqlite3_ha/base.py", line 412, in execute', '  File "freenasUI/freeadmin/sqlite3_ha/base.py", line 403, in locked_retry', '  File "freenasUI/freeadmin/sqlite3_ha/base.py", line 389, in locked_retry']

#6 Updated by Dru Lavigne almost 3 years ago

  • Target version changed from TrueNAS 11.2 to TrueNAS 11.2-U3

#7 Updated by Ryan McKenzie almost 3 years ago

I did a couple experiments with this. When I have "system->system dataset->reporting database" checked, this issue reproduces. With that option unchecked, the issue does not reproduce and the reporting charts keep generating after any kind of pool reconfig.

#8 Updated by William Grzybowski almost 3 years ago

  • Status changed from Screened to Unscreened
  • Assignee changed from William Grzybowski to Vladimir Vinogradenko

#9 Updated by Dru Lavigne almost 3 years ago

  • Target version changed from TrueNAS 11.2-U3 to 11.2-U4

#10 Updated by Ryan McKenzie almost 3 years ago

Retested this on a much more recent 2019-01-29 nightly of TrueNAS...problem still reproduces.

On a clean healthy M50-HA, no alerts and a couple days of uptime. Reporting charts are rendering and have data.

Checked the "system->system dataset->reporting database" option. For 20 minutes still getting charts rendered and with data.

Added an L2ARC device to my pool via the UI volume manager and immeditely get these console errors:

Feb 26 10:43:38 tn11b collectd[11524]: rrdcached plugin: rrdc_update (/var/db/collectd/rrd/tn11b.lab.ixsystems.com/geom_stat/geom_ops_rwd-da91.rrd, [1551195818:0.000000:0.000000:0.000000], 1) failed: rrdcached: No such file: /var/db/system/rrd-66d4b88150864a3198c2d0d2659a6f47/tn11b.lab.ixsystems.com/geom_stat/geom_ops_rwd-da91.rrd (status=-1)
Feb 26 10:43:38 tn11b collectd[11524]: rrdcached plugin: rrdc_update (/var/db/collectd/rrd/tn11b.lab.ixsystems.com/geom_stat/geom_bw-da91.rrd, [1551195818:0.000000:0.000000:0.000000], 1) failed: rrdcached: No such file: /var/db/system/rrd-66d4b88150864a3198c2d0d2659a6f47/tn11b.lab.ixsystems.com/geom_stat/geom_bw-da91.rrd (status=-1)
Feb 26 10:43:38 tn11b collectd[11524]: rrdcached plugin: rrdc_update (/var/db/collectd/rrd/tn11b.lab.ixsystems.com/geom_stat/geom_latency-da91.rrd, [1551195818:0.000000:0.000000:0.000000], 1) failed: rrdcached: No such file: /var/db/system/rrd-66d4b88150864a3198c2d0d2659a6f47/tn11b.lab.ixsystems.com/geom_stat/geom_latency-da91.rrd (status=-1)
Feb 26 10:43:38 tn11b collectd[11524]: rrdcached plugin: rrdc_update (/var/db/collectd/rrd/tn11b.lab.ixsystems.com/disk-da117/disk_octets.rrd, [1551195818:93749248:28090368], 1) failed: rrdcached: No such file: /var/db/system/rrd-66d4b88150864a3198c2d0d2659a6f47/tn11b.lab.ixsystems.com/disk-da117/disk_octets.rrd (status=-1)

Upon returning to the reporting screen, all charts are still rendered onscreen but no data is displayed from the point of time when I added the L2ARC device forward. Waited 20 more minutes to verify it wasn't just a blip.

#11 Updated by Vladimir Vinogradenko almost 3 years ago

  • Status changed from Unscreened to Blocked
  • Reason for Blocked set to Waiting for feedback

Unable to repeat on VM, asked for access to the machine

#12 Updated by Ryan McKenzie almost 3 years ago

I gave Vladimir acces to our M40-HA

Steps to replicate:
1-visit reporting screen to verify charts are rendering and populated with data
2-In UI, navigate to "system->system dataset->reporting database" option and check the box, save changes
3-wait a few minutes, visit reporting screen again to verify charts are rendering and populated with data
4-Go to storage->voluime manager and extend pool "tank" by adding the one free disk as cache device to the pool

If you are monitoring the serial console you will see that some RRD errors appear when the check box is selected
And after adding the L2ARC you can wait a few minutes and see that the charts in reporting screen no longer have data

#13 Updated by Bug Clerk almost 3 years ago

  • Status changed from Blocked to In Progress

#14 Updated by Bug Clerk almost 3 years ago

  • Status changed from In Progress to Ready for Testing

#15 Updated by Bug Clerk almost 3 years ago

  • Target version changed from 11.2-U4 to 11.2-U3

#16 Updated by Vladimir Vinogradenko almost 3 years ago

  • Status changed from Ready for Testing to In Progress

#17 Updated by Dru Lavigne almost 3 years ago

  • File deleted (debug-20181204102324.tar)

#18 Updated by Dru Lavigne almost 3 years ago

  • Subject changed from Pool reconfig strangely causes collectd to stop working properly to Do not crash rrd_toggle when switching to tmpfs
  • Private changed from Yes to No
  • Reason for Blocked deleted (Waiting for feedback)

#19 Updated by Dru Lavigne over 2 years ago

  • Status changed from In Progress to Ready for Testing
  • Needs QA changed from No to Yes

#20 Updated by Vladimir Vinogradenko over 2 years ago

  • Copied to Bug #79122: Do not crash rrd_toggle when switching to tmpfs added

#24 Updated by Bonnie Follweiler over 2 years ago

  • Status changed from Ready for Testing to Passed Testing
  • Needs QA changed from Yes to No

#28 Updated by Dru Lavigne over 2 years ago

  • Status changed from Passed Testing to Done
  • Private changed from Yes to No

Also available in: Atom PDF