Project

General

Profile

Bug #24731

Replace smartd rc script with smartd-daemon that forks immediately

Added by Andrew McCann over 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
Important
Assignee:
Vladimir Vinogradenko
Category:
OS
Target version:
Seen in:
Sprint:
Severity:
New
Backlog Priority:
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Hi, on a fresh FreeNAS 11 Release install, i tried editing the Smart tests received errors when trying to save both tests.

Thanks ======================================================================================

Request Method: POST
Request URL: http://169.191.1.1/admin/tasks/smarttest/edit/2/
Software Version: FreeNAS-11.0-RELEASE (a2dc21583)
Exception Type: CallTimeout
Exception Value:

Call timeout

Exception Location: /usr/local/lib/python3.6/site-packages/middlewared/client/client.py in call, line 208
Server time: Tue, 20 Jun 2017 11:34:05 +1000
Traceback

Environment:

Software Version: FreeNAS-11.0-RELEASE (a2dc21583)
Request Method: POST
Request URL: http://169.191.1.1/admin/tasks/smarttest/edit/2/

Traceback:
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
39. response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response
249. response = self._get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
178. response = middleware_method(request, callback, callback_args, callback_kwargs)
File "./freenasUI/freeadmin/middleware.py" in process_view
162. return login_required(view_func)(request, *view_args, **view_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
23. return view_func(request, *args, **kwargs)
File "./freenasUI/freeadmin/options.py" in wrapper
208. return self._admin.admin_view(view)(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/utils/decorators.py" in _wrapped_view
149. response = view_func(request, *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/views/decorators/cache.py" in _wrapped_view_func
57. response = view_func(request, *args, **kwargs)
File "./freenasUI/freeadmin/site.py" in inner
142. return view(request, *args, **kwargs)
File "./freenasUI/freeadmin/options.py" in edit
566. mf.save()
File "./freenasUI/tasks/forms.py" in save
453. notifier().restart("smartd")
File "./freenasUI/middleware/notifier.py" in restart
273. return c.call('service.restart', what, {'onetime': onetime}, **kwargs)
File "./freenasUI/middleware/notifier.py" in restart
273. return c.call('service.restart', what, {'onetime': onetime}, **kwargs)
File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py" in call
208. raise CallTimeout("Call timeout")

Exception Type: CallTimeout at /admin/tasks/smarttest/edit/2/
Exception Value: Call timeout

Request information
GET

No GET data
POST
Variable Value
all ''
smarttest_disks '{serial_lunid}63PZ419GS_5000039ff4cdb24e'
smarttest_type 'L'
smarttest_desc 'Long Self-Test'
smarttest_hour '00'
smarttest_daymonth '01'
smarttest_month '12'
smarttest_dayweek '7'
__form_id 'dialogForm_smarttest'
FILES

No FILES data
COOKIES
Variable Value
csrftoken 'wVb3TqPG1Vt57x71eFrzVmiTUHC0H3oDdfw5nd5aDAXlYSSIrJ9H1W7rLmFypUMI'
fntreeSaveStateCookie 'root%2Croot%2F75%2F156%2Croot%2F54%2F61%2F63%2Croot%2F242%2Croot%2F38%2F39%2Croot%2F38%2F49%2Croot%2F75%2F76%2F149%2Croot%2F75%2F160%2Croot%2F240%2F246%2F248%2Croot%2F240%2F260%2F262%2Croot%2F240%2F260%2Croot%2F75%2F76%2F77%2Croot%2F57%2F58%2F59%2Croot%2F134%2Croot%2F134%2F144%2Croot%2F39%2F46%2F48%2Croot%2F39%2F46%2F48%2F49%2Croot%2F213%2F219%2Croot%2F161%2F171%2Croot%2F218%2F224%2F226%2Croot%2F233%2F239%2Croot%2F213%2Croot%2F217%2Croot%2F245%2F267%2Croot%2F169%2Croot%2F203%2Croot%2F39%2F41%2Croot%2F202%2Croot%2F60%2F61%2F62%2Croot%2F221%2F222%2F227%2Croot%2F221%2F222%2F228%2Croot%2F221%2F222%2F230%2Croot%2F60%2F61%2Croot%2F221%2Croot%2F1%2F5%2Croot%2F27%2Croot%2F1%2F6%2Croot%2F1%2F7%2Croot%2F1%2F8%2Croot%2F229%2F240%2F242%2Croot%2F55%2F62%2F64%2Croot%2F55%2F62%2F64%2F65%2Croot%2F76%2F77%2Croot%2F229%2F235%2F237%2Croot%2F222%2Croot%2F224%2F230%2F232%2Croot%2F226%2Croot%2F229%2Croot%2F77%2F78%2Croot%2F77%2F78%2F79%2Croot%2F230%2Croot%2F230%2F241%2Croot%2F230%2F241%2F243%2Croot%2F77%2F78%2F79%2F128%2Croot%2F78%2F79%2F80%2Croot%2F231%2F242%2F244%2Croot%2F78%2F79%2F80%2F129%2Croot%2F41%2Croot%2F41%2F49%2Croot%2F41%2F52%2Croot%2F228'
sessionid 'lo8dbv4uwvi4adic0i4itk6q9zpp5xpi'
META
Variable Value

======================================================================================
Request Method: POST
Request URL: http://169.191.1.1/admin/tasks/smarttest/edit/1/
Software Version: FreeNAS-11.0-RELEASE (a2dc21583)
Exception Type: CallTimeout
Exception Value:

Call timeout

Exception Location: /usr/local/lib/python3.6/site-packages/middlewared/client/client.py in call, line 208
Server time: Tue, 20 Jun 2017 11:38:19 +1000
Traceback

Environment:

Software Version: FreeNAS-11.0-RELEASE (a2dc21583)
Request Method: POST
Request URL: http://169.191.1.1/admin/tasks/smarttest/edit/1/

Traceback:
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
39. response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response
249. response = self._get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
178. response = middleware_method(request, callback, callback_args, callback_kwargs)
File "./freenasUI/freeadmin/middleware.py" in process_view
162. return login_required(view_func)(request, *view_args, **view_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
23. return view_func(request, *args, **kwargs)
File "./freenasUI/freeadmin/options.py" in wrapper
208. return self._admin.admin_view(view)(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/utils/decorators.py" in _wrapped_view
149. response = view_func(request, *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/views/decorators/cache.py" in _wrapped_view_func
57. response = view_func(request, *args, **kwargs)
File "./freenasUI/freeadmin/site.py" in inner
142. return view(request, *args, **kwargs)
File "./freenasUI/freeadmin/options.py" in edit
566. mf.save()
File "./freenasUI/tasks/forms.py" in save
453. notifier().restart("smartd")
File "./freenasUI/middleware/notifier.py" in restart
273. return c.call('service.restart', what, {'onetime': onetime}, **kwargs)
File "./freenasUI/middleware/notifier.py" in restart
273. return c.call('service.restart', what, {'onetime': onetime}, **kwargs)
File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py" in call
208. raise CallTimeout("Call timeout")

Exception Type: CallTimeout at /admin/tasks/smarttest/edit/1/
Exception Value: Call timeout

Request information
GET

No GET data
POST
Variable Value
all ''
smarttest_disks '{serial_lunid}63PZ419GS_5000039ff4cdb24e'
smarttest_type 'S'
smarttest_desc 'Short Self-Test'
smarttest_hour '*'
smarttest_daymonth '*'
smarttest_month '12'
smarttest_dayweek '7'
__form_id 'dialogForm_smarttest'
FILES

No FILES data
COOKIES
Variable Value
csrftoken 'wVb3TqPG1Vt57x71eFrzVmiTUHC0H3oDdfw5nd5aDAXlYSSIrJ9H1W7rLmFypUMI'
fntreeSaveStateCookie 'root%2Croot%2F75%2F156%2Croot%2F54%2F61%2F63%2Croot%2F242%2Croot%2F38%2F39%2Croot%2F38%2F49%2Croot%2F75%2F76%2F149%2Croot%2F75%2F160%2Croot%2F240%2F246%2F248%2Croot%2F240%2F260%2F262%2Croot%2F240%2F260%2Croot%2F75%2F76%2F77%2Croot%2F57%2F58%2F59%2Croot%2F134%2Croot%2F134%2F144%2Croot%2F39%2F46%2F48%2Croot%2F39%2F46%2F48%2F49%2Croot%2F213%2F219%2Croot%2F161%2F171%2Croot%2F218%2F224%2F226%2Croot%2F233%2F239%2Croot%2F213%2Croot%2F217%2Croot%2F245%2F267%2Croot%2F169%2Croot%2F203%2Croot%2F39%2F41%2Croot%2F202%2Croot%2F60%2F61%2F62%2Croot%2F221%2F222%2F227%2Croot%2F221%2F222%2F228%2Croot%2F221%2F222%2F230%2Croot%2F60%2F61%2Croot%2F221%2Croot%2F1%2F5%2Croot%2F27%2Croot%2F1%2F6%2Croot%2F1%2F7%2Croot%2F1%2F8%2Croot%2F229%2F240%2F242%2Croot%2F55%2F62%2F64%2Croot%2F55%2F62%2F64%2F65%2Croot%2F76%2F77%2Croot%2F229%2F235%2F237%2Croot%2F222%2Croot%2F224%2F230%2F232%2Croot%2F226%2Croot%2F229%2Croot%2F77%2F78%2Croot%2F77%2F78%2F79%2Croot%2F230%2Croot%2F230%2F241%2Croot%2F230%2F241%2F243%2Croot%2F77%2F78%2F79%2F128%2Croot%2F78%2F79%2F80%2Croot%2F231%2F242%2F244%2Croot%2F78%2F79%2F80%2F129%2Croot%2F41%2Croot%2F41%2F49%2Croot%2F41%2F52%2Croot%2F228'
sessionid 'lo8dbv4uwvi4adic0i4itk6q9zpp5xpi'
META
Variable Value

Smart Error.PNG (50.9 KB) Smart Error.PNG Andrew McCann, 06/20/2017 06:16 AM
Smart2.PNG (45.7 KB) Smart2.PNG Andrew McCann, 06/20/2017 06:18 AM
Smart Services.PNG (43.5 KB) Smart Services.PNG Andrew McCann, 06/20/2017 05:51 PM
11513
11514
11527

Related issues

Is duplicate of FreeNAS - Bug #26251: Got error when create pool in 11U3Closed: Duplicate2017-10-18

Associated revisions

Revision e32dcb12 (diff)
Added by Vladimir Vinogradenko about 1 year ago

fix(smartd): Replace smartd rc script with smartd-daemon that forks immediately

Ticket: #24731

Revision 2a22b239 (diff)
Added by Vladimir Vinogradenko about 1 year ago

fix(smartd): Replace smartd rc script with smartd-daemon that forks immediately

Ticket: #24731

Revision 0ce7bfec (diff)
Added by Vladimir Vinogradenko about 1 year ago

fix(smartd): Replace smartd rc script with smartd-daemon that forks immediately

Ticket: #24731

History

#1 Updated by William Grzybowski over 1 year ago

  • Status changed from Unscreened to 15

Does that happen every time?

Please attach Save Debug (System - Advanced)

#2 Updated by Andrew McCann over 1 year ago

  • File debug-freenas-20170620230022.tgz added
  • Private changed from No to Yes

Yes, can reproduce every time. Attached debug

Thanks

#3 Updated by Andrew McCann over 1 year ago

11513

Just tried creating a new smart test, same result.

#4 Updated by Andrew McCann over 1 year ago

11514

Does seem to create the new task though?

#5 Updated by William Grzybowski over 1 year ago

Andrew McCann wrote:

Just tried creating a new smart test, same result.

The error is about restarting smartd service, the task is created in the database.

Can you reproduce it by running in shell?

  1. midctl call service.restart smartd

#6 Updated by Andrew McCann over 1 year ago

Was this a typo, command not found?

[root@freenas ~]# midctl call service.restart smartd
bash: midctl: command not found
[root@freenas ~]#

#7 Updated by William Grzybowski over 1 year ago

Andrew McCann wrote:

Was this a typo, command not found?

[root@freenas ~]# midctl call service.restart smartd
bash: midctl: command not found
[root@freenas ~]#

Yes, "midclt" not "midctl", sorry.

#8 Updated by Andrew McCann over 1 year ago

11527

interesting, sat there for about 30-40 seconds and times out again.

[root@freenas ]# midclt call service.restart smartd
Call timeout
[root@freenas
]#

Not sure if this helps, but testing this on the services menu, it stops ok, and it does start, bet leaves the wait cursor running? (see screenshot)

#9 Updated by William Grzybowski over 1 year ago

Andrew McCann wrote:

interesting, sat there for about 30-40 seconds and times out again.

[root@freenas ]# midclt call service.restart smartd
Call timeout
[root@freenas
]#

Not sure if this helps, but testing this on the services menu, it stops ok, and it does start, bet leaves the wait cursor running? (see screenshot)

Would you be open for a TeamViewer session, since I can't reproduce that particular issue?

#10 Updated by Andrew McCann over 1 year ago

sure, based in AUS, so your morning or evening would be easier?

#11 Updated by Andrew McCann over 1 year ago

Hi William

Online for the next 2 hours, happy to do a teamviewer if your ok to.

Cheers

#12 Updated by William Grzybowski over 1 year ago

Andrew McCann wrote:

Hi William

Online for the next 2 hours, happy to do a teamviewer if your ok to.

Cheers

Yes, we can do it now, send me your teamviewer creds (here via private note checked or william at+ ixsystems .com

Thanks,

#13 Updated by Andrew McCann over 1 year ago

email sent

#14 Updated by William Grzybowski over 1 year ago

  • Priority changed from No priority to Important
  • Target version set to 11.0-U1

I have concluded smartd is taking too long to the initial disk scan before the daemon forks. I'll have to investigate it further and/or come up with a workaround.

#15 Updated by William Grzybowski over 1 year ago

  • Status changed from 15 to Screened

#16 Updated by Vaibhav Chauhan over 1 year ago

  • Target version changed from 11.0-U1 to 11.0-U2

#17 Updated by Vaibhav Chauhan over 1 year ago

  • Target version changed from 11.0-U2 to 11.0-U3

#18 Updated by William Grzybowski about 1 year ago

  • Target version changed from 11.0-U3 to 11.1

I wont be able to get this in time for 11.0-U3.

#19 Avatar?id=14398&size=24x24 Updated by Kris Moore about 1 year ago

  • Status changed from Screened to Unscreened
  • Assignee changed from William Grzybowski to Vladimir Vinogradenko

#20 Updated by Vladimir Vinogradenko about 1 year ago

  • Status changed from Unscreened to Screened

#21 Updated by Vladimir Vinogradenko about 1 year ago

  • Status changed from Screened to 15
  • Assignee changed from Vladimir Vinogradenko to William Grzybowski

From what I understand, the only solution we have now is to increase timeout for smartd start. I think this is reasonable because smartd does really important job here so we really should not hide any possible issues under the carpet.

This includes (all done for smartd operations only, but also can be useful in the future):
  • Stating «this operation can take longer than usual» in the UI
  • Increasing timeout in middlewared client(s)
  • Increasing timeout in middleware service plugin itself

#22 Updated by William Grzybowski about 1 year ago

  • Assignee changed from William Grzybowski to Vladimir Vinogradenko

Well, the behavior I have observed is that smartd is scanning all disks in foreground before forking, and each disk is taking a couple seconds, so it is not a constant time.

I was wondering, why does it do all the scanning in foreground? Maybe we could investigate why is it not entering into daemon mode earlier?

#23 Updated by Dru Lavigne about 1 year ago

  • Status changed from 15 to Unscreened

#24 Updated by Vladimir Vinogradenko about 1 year ago

  • Status changed from Unscreened to 15
  • Assignee changed from Vladimir Vinogradenko to William Grzybowski

smartd does all scanning in foreground before forking because it is programmed to do so https://github.com/mirror/smartmontools/blob/master/smartd.cpp#L5413

There is no config option to change that

I think this is done that way because smartd wants to make sure that it is configured correctly (i.e. can monitor all drives specified in config file) before exiting with success code.

Normally scanning is finished very quickly. I wonder why it takes long time for Andrew McCann: are his hard-drives not supported by smartd (then we can just try to exclude them in ix-smartd) or are they supported, just take some time to initialize (then we can increase startup timeout or write our own init script with /usr/sbin/daemon and --no-fork).

In the smartd manpage I see that this case is not unique for us:

onecheck - Start smartd in debug mode, then register devices,
then check device´s SMART status once, and then exit with zero
exit status if all of these steps worked correctly.

This last option is intended for ´distribution-writers´ who want
to create automated scripts to determine whether or not to
automatically start up smartd after installing smartmontools.
After starting smartd with this command-line option, the
distribution´s install scripts should wait a reasonable length
of time (say ten seconds).  If smartd has not exited with zero
status by that time, the script should send smartd a SIGTERM or
SIGKILL and assume that smartd will not operate correctly on the
host.  Conversely, if smartd exits with zero status, then it is
safe to run smartd in normal daemon mode. If smartd is unable to
monitor any devices or encounters other problems then it will
return with non-zero exit status.

#25 Updated by Andrew McCann about 1 year ago

I am happy for you to have another look at my server if you require more information, but i cant do this until 16th October as i am on holidays!! :)

#26 Updated by William Grzybowski about 1 year ago

  • Assignee changed from William Grzybowski to Vladimir Vinogradenko
Jun 19 22:30:43 freenas smartd[16960]: Configuration file /usr/local/etc/smartd.conf parsed.
Jun 19 22:30:43 freenas smartd[16960]: Device: /dev/da0, type changed from 'scsi' to 'sat'
...
Jun 19 22:31:45 freenas smartd[16960]: Device: /dev/da13 [SAT], initial Temperature is 30 Celsius (Min/Max ??/30)
Jun 19 22:31:46 freenas smartd[17068]: smartd has fork()ed into background mode. New PID=17068.

(you can see that in daemon.log in his debug output, disks are all supported)

In summary his smartd is taking a bit over a minute to fork.

My opinion is that taking that long to perform an UI operation (edit smart settings or smart tests) is not a good user experience.

Also I am not sure waiting for smartd to restart to perform the operation is what we want here.

I see a few possible solutions here:
- call smartd restart asynchronously and do not wait for the response (and in the future improve service monitoring to alert when a service dies)
- propose a fix to smartd to fork earlier
- write our own init script as you suggested

I am fine with either solution for now. Do you agree/disagree? What are your thoughts?

#27 Updated by Vladimir Vinogradenko about 1 year ago

- call smartd restart asynchronously and do not wait for the response (and in the future improve service monitoring to alert when a service dies)
- write our own init script as you suggested

Seem to yield the same result, but own init script option is cleaner, I suggest we do that.

#28 Updated by Vladimir Vinogradenko about 1 year ago

  • Assignee changed from Vladimir Vinogradenko to William Grzybowski

#29 Updated by William Grzybowski about 1 year ago

  • Status changed from 15 to Screened
  • Assignee changed from William Grzybowski to Vladimir Vinogradenko

Ok, then lets do that! Thanks!

#30 Updated by Vladimir Vinogradenko about 1 year ago

  • Status changed from Screened to Ready For Release

#31 Updated by Dru Lavigne about 1 year ago

  • File deleted (debug-freenas-20170620230022.tgz)

#32 Updated by Dru Lavigne about 1 year ago

  • Subject changed from Traceback for Long and Short Smart tests to Replace smartd rc script with smartd-daemon that forks immediately
  • Private changed from Yes to No

#33 Updated by Dru Lavigne about 1 year ago

  • Target version changed from 11.1 to 11.1-BETA1

#34 Updated by William Grzybowski 12 months ago

  • Is duplicate of Bug #26251: Got error when create pool in 11U3 added

#35 Updated by Dru Lavigne 12 months ago

  • Status changed from Ready For Release to Resolved

#36 Updated by Bonnie Follweiler 11 months ago

  • Needs QA changed from Yes to No
  • QA Status Test Passes FreeNAS added
  • QA Status deleted (Not Tested)

#37 Updated by Andrew McCann 11 months ago

Just to confirm, I have on the 11.1 nightly (FreeNAS-11-MASTER-201711140419) and can save the smart tests without error.

Thanks
Andrew

Also available in: Atom PDF