Project

General

Profile

Bug #20260

Getting a traceback when cron started the last scrub

Added by Ovidiu Pacuraru over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Critical
Assignee:
William Grzybowski
Category:
Middleware
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

actually I saw this running FreeNAS-9.10.2 before applying U1.

Running on a HP N54L microserver with 16GB ECC RAM and 6x 3TB WD REDs

ChangeLog Required:
No

Description

I got this via email from my freenas and have no clue what it means. The scrub got started and is running though.

Subject:
Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 35 tank

Message:
Exception in thread WebSocketClient:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/
threading.py", line 801, in _bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/threading.py", line 754, in run
self.
_target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/site-packages/ws4py/websocket.py", line 427, in run
if not self.once():
File "/usr/local/lib/python2.7/site-packages/ws4py/websocket.py", line 305, in once
if not self.process(b):
File "/usr/local/lib/python2.7/site-packages/ws4py/websocket.py", line 377, in process
self.received_message(s.message)
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 32, in received_message
self.protocol.on_message(message.data.decode('utf8'))
File "/usr/local/lib/python2.7/site-packages/middlewared/client/protocol.py", line 26, in on_message
self.app.on_message(message)
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 38, in on_message
self.client._recv(message)
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 100, in _recv
self._connected.set()
AttributeError: 'Client' object has no attribute '_connected'

Traceback (most recent call last):
File "/usr/local/bin/midclt", line 9, in <module>
load_entry_point('middlewared==0.0.0', 'console_scripts', 'midclt')()
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 211, in main
with Client(uri=args.uri) as c:
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 83, in init
raise ClientException('Failed connection handshake')
middlewared.client.client.ClientException: Failed connection handshake
starting scrub of pool 'tank'


Related issues

Is duplicate of FreeNAS - Bug #21138: Strange Cron E-mail receivedClosed: Duplicate2017-02-13
Has duplicate FreeNAS - Bug #20052: Automatic Scrubs of boot pool are failingClosed: Duplicate2017-01-05
Has duplicate FreeNAS - Bug #21506: Python AttributeError during scheduled scrubClosed: Duplicate2017-02-26

Associated revisions

Revision 49e183a7 (diff)
Added by William Grzybowski over 4 years ago

fix(middlewared): race condition in client connect Ticket: #20260

Revision f5980b63 (diff)
Added by William Grzybowski over 4 years ago

fix(middlewared): race condition in client connect Ticket: #20260 (cherry picked from commit 49e183a7ca9d455851f8b243bf1dd7e9297cd50b)

Revision 2486fec0 (diff)
Added by William Grzybowski over 4 years ago

fix(middlewared): race condition in client connect Ticket: #20260 (cherry picked from commit 49e183a7ca9d455851f8b243bf1dd7e9297cd50b)

History

#1 Updated by Alexander Motin over 4 years ago

  • Project changed from 9 to FreeNAS
  • Category changed from 181 to 201
  • Assignee changed from Alexander Motin to William Grzybowski
  • Seen in changed from to 9.10.2

It seems to be about FreeNAS 9.10, not 10, and it is about middleware, not ZFS.

#2 Updated by Dave F over 4 years ago

Also saw the same for my scheduled scrub from yesterday. Running build FreeNAS-9.10.2-U1 (86c7ef5).

cron log:

Jan 22 00:00:00 tardis /usr/sbin/cron[42660]: (root) CMD (PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 14 tank)

Nothing in debug.log

Exception in thread WebSocketClient:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/site-packages/ws4py/websocket.py", line 427, in run
    if not self.once():
  File "/usr/local/lib/python2.7/site-packages/ws4py/websocket.py", line 305, in once
    if not self.process(b):
  File "/usr/local/lib/python2.7/site-packages/ws4py/websocket.py", line 377, in process
    self.received_message(s.message)
  File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 32, in received_message
    self.protocol.on_message(message.data.decode('utf8'))
  File "/usr/local/lib/python2.7/site-packages/middlewared/client/protocol.py", line 26, in on_message
    self.app.on_message(message)
  File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 38, in on_message
    self.client._recv(message)
  File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 100, in _recv
    self._connected.set()
AttributeError: 'Client' object has no attribute '_connected'

Traceback (most recent call last):
  File "/usr/local/bin/midclt", line 9, in <module>
    load_entry_point('middlewared==0.0.0', 'console_scripts', 'midclt')()
  File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 211, in main
    with Client(uri=args.uri) as c:
  File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 83, in __init__
    raise ClientException('Failed connection handshake')
middlewared.client.client.ClientException: Failed connection handshake

#3 Updated by William Grzybowski over 4 years ago

  • Status changed from Unscreened to Needs Developer Review
  • Priority changed from No priority to Critical
  • Target version set to 9.10.2-U2

#5 Avatar?id=14398&size=24x24 Updated by Kris Moore over 4 years ago

  • Assignee changed from William Grzybowski to Suraj Ravichandran

Suraj, can you review?

#6 Updated by Marcelo Araujo over 4 years ago

Kris Moore wrote:

Suraj, can you review?

I'm OK with that! I could not simulate the same race, but I got another one that has nothing to do with this fix.

Go ahead!

#7 Updated by Marcelo Araujo over 4 years ago

  • Status changed from Needs Developer Review to Reviewed
  • Assignee changed from Suraj Ravichandran to Vaibhav Chauhan

#8 Updated by Vaibhav Chauhan over 4 years ago

  • Status changed from Reviewed to Ready For Release

#9 Updated by Ovidiu Pacuraru over 4 years ago

This report was for the scrub scheduled by me. I now got an email for the built-in scrub for the boot partition:

Subject:
Cron <root@freenas> /usr/local/libexec/nas/scrub -t 28 freenas-boot

Message:
Traceback (most recent call last):
File "/usr/local/bin/midclt", line 9, in <module>
load_entry_point('middlewared==0.0.0', 'console_scripts', 'midclt')()
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 211, in main
with Client(uri=args.uri) as c:
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 78, in init
self._ws.connect()
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 22, in connect
self.sock.settimeout(None)
AttributeError: 'NoneType' object has no attribute 'settimeout'

#10 Updated by William Grzybowski over 4 years ago

  • Is duplicate of Bug #21138: Strange Cron E-mail received added

#11 Avatar?id=14398&size=24x24 Updated by Kris Moore over 4 years ago

  • Has duplicate Bug #20052: Automatic Scrubs of boot pool are failing added

#12 Avatar?id=14398&size=24x24 Updated by Kris Moore over 4 years ago

  • Has duplicate Bug #21506: Python AttributeError during scheduled scrub added

#13 Updated by Vaibhav Chauhan over 4 years ago

  • Status changed from Ready For Release to Resolved

#14 Updated by AlexT - over 4 years ago

Not sure it is resolved...

FWIW Running FreeNAS-9.10.2-U2 (e1497f2) I am still seeing

Traceback (most recent call last):
File "/usr/local/bin/midclt", line 9, in <module>
load_entry_point('middlewared==0.0.0', 'console_scripts', 'midclt')()
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 211, in main
with Client(uri=args.uri) as c:
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 79, in __init__
self._ws.connect()
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 21, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python2.7/site-packages/ws4py/client/__init__.py", line 216, in connect
bytes = self.sock.recv(128)
socket.timeout: timed out

#15 Updated by Dru Lavigne over 4 years ago

  • Status changed from Resolved to Unscreened
  • Assignee changed from Vaibhav Chauhan to Kris Moore

Kris, can you see if QA can still reproduce this?

#16 Avatar?id=14398&size=24x24 Updated by Kris Moore over 4 years ago

  • Assignee changed from Kris Moore to William Grzybowski

I'm not seeing this here, but that error looks like middlewared isn't running. Can you try "service middlewared start" or "middlewared -f" to run it in the foreground and see if its failing to load for some reason?

#17 Updated by William Grzybowski over 4 years ago

  • Status changed from Unscreened to 15

Also attach your "save debug" file.

#18 Updated by William Grzybowski over 4 years ago

  • Target version changed from 9.10.2-U2 to 9.10.3
  • Seen in changed from 9.10.2 to 9.10.2-U2

#19 Updated by Barry Grussling over 4 years ago

I hit this issue on one of my 9.10.2-U2 boxes last night. My middleware is running. I did have a blip in the middleware logfile at around the same time as the Failed connection handhake email:


[2017-03-17 02:41:35,921 main.py:161] (WARNING) Exception while calling notifier.sync_encrypted(*[])
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/middlewared/main.py", line 157, in call_method
    'result': self.middleware.call_method(self, message),
  File "/usr/local/lib/python2.7/site-packages/middlewared/main.py", line 389, in call_method
    return methodobj(*args)
  File "/usr/local/www/freenasUI/middleware/notifier.py", line 4836, in sync_encrypted
    zpool = self.zpool_parse(vol.vol_name)
  File "/usr/local/www/freenasUI/middleware/notifier.py", line 4570, in zpool_parse
    parse = zfs.parse_status(name, doc, res)
  File "/usr/local/www/freenasUI/middleware/zfs.py", line 766, in parse_status
    status = data.split('config:')[1]
IndexError: list index out of range
[2017-03-17 02:41:35,922 websocket.py:369] (DEBUG) Failed to write closing frame -> closing socket
[2017-03-17 02:41:35,922 websocket.py:371] (DEBUG) Closed WebSocket

#20 Updated by William Grzybowski over 4 years ago

Barry Grussling wrote:

I hit this issue on one of my 9.10.2-U2 boxes last night. My middleware is running. I did have a blip in the middleware logfile at around the same time as the Failed connection handhake email:
[...]

Thats a different issue.

#21 Avatar?id=14398&size=24x24 Updated by Kris Moore over 4 years ago

  • Status changed from 15 to Resolved

The original problem appears resolved here. If anybody runs into other issues, please open a new ticket.

#22 Avatar?id=14398&size=24x24 Updated by Kris Moore over 4 years ago

  • Target version changed from 9.10.3 to 11.0

#23 Updated by Vaibhav Chauhan about 4 years ago

  • Target version changed from 11.0 to 11.0-RC

#24 Updated by Ovidiu Pacuraru about 4 years ago

Got another cron job email related to the same scrub, might be related:

Subject: Cron <root@freenas> /usr/local/libexec/nas/scrub -t 28 freenas-boot
Content: Exception in thread WebSocketClient (most likely raised during interpreter shutdown):

#25 Updated by William Grzybowski about 4 years ago

Ovidiu Pacuraru wrote:

Got another cron job email related to the same scrub, might be related:

Subject: Cron <root@freenas> /usr/local/libexec/nas/scrub -t 28 freenas-boot
Content: Exception in thread WebSocketClient (most likely raised during interpreter shutdown):

What version are you running?

#26 Updated by Ovidiu Pacuraru about 4 years ago

9.10.2-U3

#27 Updated by William Grzybowski about 4 years ago

Ovidiu Pacuraru wrote:

9.10.2-U3

Thats why, it is resolved in 11.0-RC, not before.

#28 Updated by Ovidiu Pacuraru about 4 years ago

Sure thing, I just wanted to add more info in case it helped :-)

#29 Updated by David Erickson about 4 years ago

William Grzybowski wrote:

Ovidiu Pacuraru wrote:

9.10.2-U3

Thats why, it is resolved in 11.0-RC, not before.

William why is a fix not being ported back to the 9.10 train? I get one of the below in my inbox every single night now.

Traceback (most recent call last):
File "/usr/local/bin/midclt", line 9, in <module>
load_entry_point('middlewared==0.0.0', 'console_scripts', 'midclt')()
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 211, in main
with Client(uri=args.uri) as c:
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 79, in init
self._ws.connect()
File "/usr/local/lib/python2.7/site-packages/middlewared/client/client.py", line 21, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python2.7/site-packages/ws4py/client/__init__.py", line 216, in connect
bytes = self.sock.recv(128)
socket.timeout: timed out

Also available in: Atom PDF