Bug #24731
Replace smartd rc script with smartd-daemon that forks immediately
Description
Hi, on a fresh FreeNAS 11 Release install, i tried editing the Smart tests received errors when trying to save both tests.
Thanks ======================================================================================
Request Method: POST
Request URL: http://169.191.1.1/admin/tasks/smarttest/edit/2/
Software Version: FreeNAS-11.0-RELEASE (a2dc21583)
Exception Type: CallTimeout
Exception Value:
Call timeout
Exception Location: /usr/local/lib/python3.6/site-packages/middlewared/client/client.py in call, line 208
Server time: Tue, 20 Jun 2017 11:34:05 +1000
Traceback
Environment:
Software Version: FreeNAS-11.0-RELEASE (a2dc21583)
Request Method: POST
Request URL: http://169.191.1.1/admin/tasks/smarttest/edit/2/
Traceback:
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
39. response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response
249. response = self._get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
178. response = middleware_method(request, callback, callback_args, callback_kwargs)
File "./freenasUI/freeadmin/middleware.py" in process_view
162. return login_required(view_func)(request, *view_args, **view_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
23. return view_func(request, *args, **kwargs)
File "./freenasUI/freeadmin/options.py" in wrapper
208. return self._admin.admin_view(view)(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/utils/decorators.py" in _wrapped_view
149. response = view_func(request, *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/views/decorators/cache.py" in _wrapped_view_func
57. response = view_func(request, *args, **kwargs)
File "./freenasUI/freeadmin/site.py" in inner
142. return view(request, *args, **kwargs)
File "./freenasUI/freeadmin/options.py" in edit
566. mf.save()
File "./freenasUI/tasks/forms.py" in save
453. notifier().restart("smartd")
File "./freenasUI/middleware/notifier.py" in restart
273. return c.call('service.restart', what, {'onetime': onetime}, **kwargs)
File "./freenasUI/middleware/notifier.py" in restart
273. return c.call('service.restart', what, {'onetime': onetime}, **kwargs)
File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py" in call
208. raise CallTimeout("Call timeout")
Exception Type: CallTimeout at /admin/tasks/smarttest/edit/2/
Exception Value: Call timeout
Request information
GET
No GET data
POST
Variable Value
all ''
smarttest_disks '{serial_lunid}63PZ419GS_5000039ff4cdb24e'
smarttest_type 'L'
smarttest_desc 'Long Self-Test'
smarttest_hour '00'
smarttest_daymonth '01'
smarttest_month '12'
smarttest_dayweek '7'
__form_id 'dialogForm_smarttest'
FILES
No FILES data
COOKIES
Variable Value
csrftoken 'wVb3TqPG1Vt57x71eFrzVmiTUHC0H3oDdfw5nd5aDAXlYSSIrJ9H1W7rLmFypUMI'
fntreeSaveStateCookie 'root%2Croot%2F75%2F156%2Croot%2F54%2F61%2F63%2Croot%2F242%2Croot%2F38%2F39%2Croot%2F38%2F49%2Croot%2F75%2F76%2F149%2Croot%2F75%2F160%2Croot%2F240%2F246%2F248%2Croot%2F240%2F260%2F262%2Croot%2F240%2F260%2Croot%2F75%2F76%2F77%2Croot%2F57%2F58%2F59%2Croot%2F134%2Croot%2F134%2F144%2Croot%2F39%2F46%2F48%2Croot%2F39%2F46%2F48%2F49%2Croot%2F213%2F219%2Croot%2F161%2F171%2Croot%2F218%2F224%2F226%2Croot%2F233%2F239%2Croot%2F213%2Croot%2F217%2Croot%2F245%2F267%2Croot%2F169%2Croot%2F203%2Croot%2F39%2F41%2Croot%2F202%2Croot%2F60%2F61%2F62%2Croot%2F221%2F222%2F227%2Croot%2F221%2F222%2F228%2Croot%2F221%2F222%2F230%2Croot%2F60%2F61%2Croot%2F221%2Croot%2F1%2F5%2Croot%2F27%2Croot%2F1%2F6%2Croot%2F1%2F7%2Croot%2F1%2F8%2Croot%2F229%2F240%2F242%2Croot%2F55%2F62%2F64%2Croot%2F55%2F62%2F64%2F65%2Croot%2F76%2F77%2Croot%2F229%2F235%2F237%2Croot%2F222%2Croot%2F224%2F230%2F232%2Croot%2F226%2Croot%2F229%2Croot%2F77%2F78%2Croot%2F77%2F78%2F79%2Croot%2F230%2Croot%2F230%2F241%2Croot%2F230%2F241%2F243%2Croot%2F77%2F78%2F79%2F128%2Croot%2F78%2F79%2F80%2Croot%2F231%2F242%2F244%2Croot%2F78%2F79%2F80%2F129%2Croot%2F41%2Croot%2F41%2F49%2Croot%2F41%2F52%2Croot%2F228'
sessionid 'lo8dbv4uwvi4adic0i4itk6q9zpp5xpi'
META
Variable Value
======================================================================================
Request Method: POST
Request URL: http://169.191.1.1/admin/tasks/smarttest/edit/1/
Software Version: FreeNAS-11.0-RELEASE (a2dc21583)
Exception Type: CallTimeout
Exception Value:
Call timeout
Exception Location: /usr/local/lib/python3.6/site-packages/middlewared/client/client.py in call, line 208
Server time: Tue, 20 Jun 2017 11:38:19 +1000
Traceback
Environment:
Software Version: FreeNAS-11.0-RELEASE (a2dc21583)
Request Method: POST
Request URL: http://169.191.1.1/admin/tasks/smarttest/edit/1/
Traceback:
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
39. response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response
249. response = self._get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
178. response = middleware_method(request, callback, callback_args, callback_kwargs)
File "./freenasUI/freeadmin/middleware.py" in process_view
162. return login_required(view_func)(request, *view_args, **view_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
23. return view_func(request, *args, **kwargs)
File "./freenasUI/freeadmin/options.py" in wrapper
208. return self._admin.admin_view(view)(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/utils/decorators.py" in _wrapped_view
149. response = view_func(request, *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/views/decorators/cache.py" in _wrapped_view_func
57. response = view_func(request, *args, **kwargs)
File "./freenasUI/freeadmin/site.py" in inner
142. return view(request, *args, **kwargs)
File "./freenasUI/freeadmin/options.py" in edit
566. mf.save()
File "./freenasUI/tasks/forms.py" in save
453. notifier().restart("smartd")
File "./freenasUI/middleware/notifier.py" in restart
273. return c.call('service.restart', what, {'onetime': onetime}, **kwargs)
File "./freenasUI/middleware/notifier.py" in restart
273. return c.call('service.restart', what, {'onetime': onetime}, **kwargs)
File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py" in call
208. raise CallTimeout("Call timeout")
Exception Type: CallTimeout at /admin/tasks/smarttest/edit/1/
Exception Value: Call timeout
Request information
GET
No GET data
POST
Variable Value
all ''
smarttest_disks '{serial_lunid}63PZ419GS_5000039ff4cdb24e'
smarttest_type 'S'
smarttest_desc 'Short Self-Test'
smarttest_hour '*'
smarttest_daymonth '*'
smarttest_month '12'
smarttest_dayweek '7'
__form_id 'dialogForm_smarttest'
FILES
No FILES data
COOKIES
Variable Value
csrftoken 'wVb3TqPG1Vt57x71eFrzVmiTUHC0H3oDdfw5nd5aDAXlYSSIrJ9H1W7rLmFypUMI'
fntreeSaveStateCookie 'root%2Croot%2F75%2F156%2Croot%2F54%2F61%2F63%2Croot%2F242%2Croot%2F38%2F39%2Croot%2F38%2F49%2Croot%2F75%2F76%2F149%2Croot%2F75%2F160%2Croot%2F240%2F246%2F248%2Croot%2F240%2F260%2F262%2Croot%2F240%2F260%2Croot%2F75%2F76%2F77%2Croot%2F57%2F58%2F59%2Croot%2F134%2Croot%2F134%2F144%2Croot%2F39%2F46%2F48%2Croot%2F39%2F46%2F48%2F49%2Croot%2F213%2F219%2Croot%2F161%2F171%2Croot%2F218%2F224%2F226%2Croot%2F233%2F239%2Croot%2F213%2Croot%2F217%2Croot%2F245%2F267%2Croot%2F169%2Croot%2F203%2Croot%2F39%2F41%2Croot%2F202%2Croot%2F60%2F61%2F62%2Croot%2F221%2F222%2F227%2Croot%2F221%2F222%2F228%2Croot%2F221%2F222%2F230%2Croot%2F60%2F61%2Croot%2F221%2Croot%2F1%2F5%2Croot%2F27%2Croot%2F1%2F6%2Croot%2F1%2F7%2Croot%2F1%2F8%2Croot%2F229%2F240%2F242%2Croot%2F55%2F62%2F64%2Croot%2F55%2F62%2F64%2F65%2Croot%2F76%2F77%2Croot%2F229%2F235%2F237%2Croot%2F222%2Croot%2F224%2F230%2F232%2Croot%2F226%2Croot%2F229%2Croot%2F77%2F78%2Croot%2F77%2F78%2F79%2Croot%2F230%2Croot%2F230%2F241%2Croot%2F230%2F241%2F243%2Croot%2F77%2F78%2F79%2F128%2Croot%2F78%2F79%2F80%2Croot%2F231%2F242%2F244%2Croot%2F78%2F79%2F80%2F129%2Croot%2F41%2Croot%2F41%2F49%2Croot%2F41%2F52%2Croot%2F228'
sessionid 'lo8dbv4uwvi4adic0i4itk6q9zpp5xpi'
META
Variable Value
Related issues
Associated revisions
fix(smartd): Replace smartd rc script with smartd-daemon that forks immediately
Ticket: #24731
fix(smartd): Replace smartd rc script with smartd-daemon that forks immediately
Ticket: #24731
History
#1
Updated by William Grzybowski over 1 year ago
- Status changed from Unscreened to 15
Does that happen every time?
Please attach Save Debug (System - Advanced)
#2
Updated by Andrew McCann over 1 year ago
- File debug-freenas-20170620230022.tgz added
- Private changed from No to Yes
Yes, can reproduce every time. Attached debug
Thanks
#3
Updated by Andrew McCann over 1 year ago
- File Smart Error.PNG Smart Error.PNG added
Just tried creating a new smart test, same result.
#4
Updated by Andrew McCann over 1 year ago
- File Smart2.PNG Smart2.PNG added
Does seem to create the new task though?
#5
Updated by William Grzybowski over 1 year ago
Andrew McCann wrote:
Just tried creating a new smart test, same result.
The error is about restarting smartd service, the task is created in the database.
Can you reproduce it by running in shell?
- midctl call service.restart smartd
#6
Updated by Andrew McCann over 1 year ago
Was this a typo, command not found?
[root@freenas ~]# midctl call service.restart smartd
bash: midctl: command not found
[root@freenas ~]#
#7
Updated by William Grzybowski over 1 year ago
Andrew McCann wrote:
Was this a typo, command not found?
[root@freenas ~]# midctl call service.restart smartd
bash: midctl: command not found
[root@freenas ~]#
Yes, "midclt" not "midctl", sorry.
#8
Updated by Andrew McCann over 1 year ago
- File Smart Services.PNG Smart Services.PNG added
interesting, sat there for about 30-40 seconds and times out again.
[root@freenas ]# midclt call service.restart smartd
Call timeout
[root@freenas]#
Not sure if this helps, but testing this on the services menu, it stops ok, and it does start, bet leaves the wait cursor running? (see screenshot)
#9
Updated by William Grzybowski over 1 year ago
Andrew McCann wrote:
interesting, sat there for about 30-40 seconds and times out again.
[root@freenas ]# midclt call service.restart smartd
Call timeout
[root@freenas]#Not sure if this helps, but testing this on the services menu, it stops ok, and it does start, bet leaves the wait cursor running? (see screenshot)
Would you be open for a TeamViewer session, since I can't reproduce that particular issue?
#10
Updated by Andrew McCann over 1 year ago
sure, based in AUS, so your morning or evening would be easier?
#11
Updated by Andrew McCann over 1 year ago
Hi William
Online for the next 2 hours, happy to do a teamviewer if your ok to.
Cheers
#12
Updated by William Grzybowski over 1 year ago
Andrew McCann wrote:
Hi William
Online for the next 2 hours, happy to do a teamviewer if your ok to.
Cheers
Yes, we can do it now, send me your teamviewer creds (here via private note checked or william at+ ixsystems .com
Thanks,
#13
Updated by Andrew McCann over 1 year ago
email sent
#14
Updated by William Grzybowski over 1 year ago
- Priority changed from No priority to Important
- Target version set to 11.0-U1
I have concluded smartd is taking too long to the initial disk scan before the daemon forks. I'll have to investigate it further and/or come up with a workaround.
#15
Updated by William Grzybowski over 1 year ago
- Status changed from 15 to Screened
#16
Updated by Vaibhav Chauhan over 1 year ago
- Target version changed from 11.0-U1 to 11.0-U2
#17
Updated by Vaibhav Chauhan over 1 year ago
- Target version changed from 11.0-U2 to 11.0-U3
#18
Updated by William Grzybowski over 1 year ago
- Target version changed from 11.0-U3 to 11.1
I wont be able to get this in time for 11.0-U3.
#19
Updated by Kris Moore over 1 year ago
- Status changed from Screened to Unscreened
- Assignee changed from William Grzybowski to Vladimir Vinogradenko
#20
Updated by Vladimir Vinogradenko over 1 year ago
- Status changed from Unscreened to Screened
#21
Updated by Vladimir Vinogradenko over 1 year ago
- Status changed from Screened to 15
- Assignee changed from Vladimir Vinogradenko to William Grzybowski
From what I understand, the only solution we have now is to increase timeout for smartd
start. I think this is reasonable because smartd
does really important job here so we really should not hide any possible issues under the carpet.
smartd
operations only, but also can be useful in the future):
- Stating «this operation can take longer than usual» in the UI
- Increasing timeout in
middlewared
client(s) - Increasing timeout in
middleware
service plugin itself
#22
Updated by William Grzybowski over 1 year ago
- Assignee changed from William Grzybowski to Vladimir Vinogradenko
Well, the behavior I have observed is that smartd is scanning all disks in foreground before forking, and each disk is taking a couple seconds, so it is not a constant time.
I was wondering, why does it do all the scanning in foreground? Maybe we could investigate why is it not entering into daemon mode earlier?
#23
Updated by Dru Lavigne over 1 year ago
- Status changed from 15 to Unscreened
#24
Updated by Vladimir Vinogradenko over 1 year ago
- Status changed from Unscreened to 15
- Assignee changed from Vladimir Vinogradenko to William Grzybowski
smartd
does all scanning in foreground before forking because it is programmed to do so https://github.com/mirror/smartmontools/blob/master/smartd.cpp#L5413
There is no config option to change that
I think this is done that way because smartd
wants to make sure that it is configured correctly (i.e. can monitor all drives specified in config file) before exiting with success code.
Normally scanning is finished very quickly. I wonder why it takes long time for Andrew McCann: are his hard-drives not supported by smartd
(then we can just try to exclude them in ix-smartd
) or are they supported, just take some time to initialize (then we can increase startup timeout or write our own init script with /usr/sbin/daemon
and --no-fork
).
In the smartd
manpage I see that this case is not unique for us:
onecheck - Start smartd in debug mode, then register devices, then check device´s SMART status once, and then exit with zero exit status if all of these steps worked correctly. This last option is intended for ´distribution-writers´ who want to create automated scripts to determine whether or not to automatically start up smartd after installing smartmontools. After starting smartd with this command-line option, the distribution´s install scripts should wait a reasonable length of time (say ten seconds). If smartd has not exited with zero status by that time, the script should send smartd a SIGTERM or SIGKILL and assume that smartd will not operate correctly on the host. Conversely, if smartd exits with zero status, then it is safe to run smartd in normal daemon mode. If smartd is unable to monitor any devices or encounters other problems then it will return with non-zero exit status.
#25
Updated by Andrew McCann over 1 year ago
I am happy for you to have another look at my server if you require more information, but i cant do this until 16th October as i am on holidays!! :)
#26
Updated by William Grzybowski over 1 year ago
- Assignee changed from William Grzybowski to Vladimir Vinogradenko
Jun 19 22:30:43 freenas smartd[16960]: Configuration file /usr/local/etc/smartd.conf parsed. Jun 19 22:30:43 freenas smartd[16960]: Device: /dev/da0, type changed from 'scsi' to 'sat' ... Jun 19 22:31:45 freenas smartd[16960]: Device: /dev/da13 [SAT], initial Temperature is 30 Celsius (Min/Max ??/30) Jun 19 22:31:46 freenas smartd[17068]: smartd has fork()ed into background mode. New PID=17068.
(you can see that in daemon.log in his debug output, disks are all supported)
In summary his smartd is taking a bit over a minute to fork.
My opinion is that taking that long to perform an UI operation (edit smart settings or smart tests) is not a good user experience.
Also I am not sure waiting for smartd to restart to perform the operation is what we want here.
I see a few possible solutions here:
- call smartd restart asynchronously and do not wait for the response (and in the future improve service monitoring to alert when a service dies)
- propose a fix to smartd to fork earlier
- write our own init script as you suggested
I am fine with either solution for now. Do you agree/disagree? What are your thoughts?
#27
Updated by Vladimir Vinogradenko over 1 year ago
- call smartd restart asynchronously and do not wait for the response (and in the future improve service monitoring to alert when a service dies)
- write our own init script as you suggested
Seem to yield the same result, but own init script option is cleaner, I suggest we do that.
#28
Updated by Vladimir Vinogradenko over 1 year ago
- Assignee changed from Vladimir Vinogradenko to William Grzybowski
#29
Updated by William Grzybowski over 1 year ago
- Status changed from 15 to Screened
- Assignee changed from William Grzybowski to Vladimir Vinogradenko
Ok, then lets do that! Thanks!
#30
Updated by Vladimir Vinogradenko over 1 year ago
- Status changed from Screened to Ready For Release
#31
Updated by Dru Lavigne over 1 year ago
- File deleted (
debug-freenas-20170620230022.tgz)
#32
Updated by Dru Lavigne over 1 year ago
- Subject changed from Traceback for Long and Short Smart tests to Replace smartd rc script with smartd-daemon that forks immediately
- Private changed from Yes to No
#33
Updated by Dru Lavigne over 1 year ago
- Target version changed from 11.1 to 11.1-BETA1
#34
Updated by William Grzybowski over 1 year ago
- Is duplicate of Bug #26251: Got error when create pool in 11U3 added
#35
Updated by Dru Lavigne over 1 year ago
- Status changed from Ready For Release to Resolved
#36
Updated by Bonnie Follweiler over 1 year ago
- Needs QA changed from Yes to No
- QA Status Test Passes FreeNAS added
- QA Status deleted (
Not Tested)
#37
Updated by Andrew McCann over 1 year ago
Just to confirm, I have on the 11.1 nightly (FreeNAS-11-MASTER-201711140419) and can save the smart tests without error.
Thanks
Andrew
fix(smartd): Replace smartd rc script with smartd-daemon that forks immediately
Ticket: #24731