Project

General

Profile

Bug #29389

Show errors on boot pool attach or replace

Added by Tino Zidore about 1 year ago. Updated 10 months ago.

Status:
Done
Priority:
No priority
Assignee:
William Grzybowski
Category:
Middleware
Target version:
Seen in:
Severity:
Medium
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Hi

I have had one drive failure in my freenas-boot mirror. I shutdown the server and replaced the disk, booted up again and chose the new disk in "replace disk" and it told that the disk was replaced, but it isn't. Then I tried to push detach but got this error message instead:
Request Method: POST
Request URL: https://vfxadmin.skofabrikken.dk/system/bootenv/pool/detach/6587456083977967384/
Software Version: FreeNAS-11.1-U2 (c636d1f4b)
Exception Type: ClientException
Exception Value:
[EINVAL] Failed to find vdev for 6587456083977967384
Exception Location: /usr/local/lib/python3.6/site-packages/middlewared/client/client.py in call, line 394
Server time: Fri, 9 Mar 2018 08:23:20 +0100
Traceback

Environment:

Software Version: FreeNAS-11.1-U2 (c636d1f4b)
Request Method: POST
Request URL: https://vfxadmin.skofabrikken.dk/system/bootenv/pool/detach/6587456083977967384/

Traceback:
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
42. response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response
249. response = self._get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
178. response = middleware_method(request, callback, callback_args, callback_kwargs)
File "./freenasUI/freeadmin/middleware.py" in process_view
162. return login_required(view_func)(request, *view_args, **view_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
23. return view_func(request, *args, **kwargs)
File "./freenasUI/system/views.py" in bootenv_pool_detach
541. c.call('boot.detach', label)
File "./freenasUI/system/views.py" in bootenv_pool_detach
541. c.call('boot.detach', label)
File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py" in call
394. raise ClientException(c.error, c.errno, c.trace, c.extra)

Exception Type: ClientException at /system/bootenv/pool/detach/6587456083977967384/
Exception Value: [EINVAL] Failed to find vdev for 6587456083977967384

Request information
GET
No GET data

POST
Variable Value
__form_id 'form_str'
FILES
No FILES data

COOKIES
Variable Value
fntreeSaveStateCookie 'root'
csrftoken '********'
sessionid 'qh66wm2ml1c51xknx5yw6p6lveujp1r5'
META
Variable Value

Is there a workaround I don't like having a degraded system mirror;-)

Looking forward to answer

replace_boot_error.jpg (117 KB) replace_boot_error.jpg Michael Dexter, 03/13/2018 11:16 PM
mirror_boot_error.jpg (161 KB) mirror_boot_error.jpg Michael Dexter, 03/14/2018 01:47 PM
boot device replacement.png (125 KB) boot device replacement.png Joe Maloney, 04/17/2018 09:26 AM
15121
15178
16716

Associated revisions

Revision a1053376 (diff)
Added by William Grzybowski 11 months ago

fix(gui): do not traceback if attaching smaller disk fails

Ticket: #29389

Revision 369cf71c (diff)
Added by William Grzybowski 11 months ago

fix(gui): show error if boot disk replace fails

Ticket: #29389

Revision 9dce1088 (diff)
Added by William Grzybowski 11 months ago

fix(gui): do not traceback if attaching smaller disk fails

Ticket: #29389

Revision 6d6d9d70 (diff)
Added by William Grzybowski 11 months ago

fix(gui): show error if boot disk replace fails

Ticket: #29389

History

#2 Updated by Tino Zidore about 1 year ago

I have tried to replace this way, but no luck:

vfxadmin# zpool status -x
  pool: freenas-boot
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 0 days 00:00:23 with 0 errors on Tue Mar  6 03:45:26 2018
config:

    NAME                     STATE     READ WRITE CKSUM
    freenas-boot             DEGRADED     0     0     0
      mirror-0               DEGRADED     0     0     0
        6587456083977967384  UNAVAIL      0     0     0  was /dev/da0p2
        da1p2                ONLINE       0     0     0

errors: No known data errors
vfxadmin# zpool online test 6587456083977967384
cannot open 'test': no such pool
vfxadmin# zpool online freenas-boot 6587456083977967384
warning: device '6587456083977967384' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
vfxadmin# zpool replace freenas-boot 6587456083977967384
cannot open '6587456083977967384': no such GEOM provider
must be a full path or shorthand device name

#3 Updated by Tino Zidore about 1 year ago

I figured it out.
I detached the drive though the shell and was able to attach the new drive through the web gui afterwards, and everything seems to be back to normal.

Except that it might be a web gui bug that I wasn't able to detach through web gui on a system disk.

#4 Updated by Dru Lavigne about 1 year ago

  • Assignee changed from Release Council to William Grzybowski

William: should the traceback for this one be fixed? If so, please set a target date and load balance.

#6 Updated by William Grzybowski about 1 year ago

  • Status changed from Unscreened to Blocked
  • Reason for Blocked set to Waiting for feedback

Did you reboot the server since this happened?

If not please attach "midclt call core.get_jobs" and the file /var/log/middlewared.log

Thanks

#7 Updated by Tino Zidore about 1 year ago

  • File midclt call.txt added
  • File middlewared.log added

Hi

I have shutdown the computer while changing the drive.

attached is the requested files and commands.

#8 Updated by William Grzybowski about 1 year ago

  • Status changed from Blocked to Not Started
  • Assignee changed from William Grzybowski to Joe Maloney
  • Reason for Blocked deleted (Waiting for feedback)

John, can you guys try to reproduce this, please? In my test using master it replaced just fine.

#9 Updated by Michael Dexter about 1 year ago

15121

I saw this in U2 today and have a partial traceback attached from a screen shot.

The failure appears to be that the "32GB" replacement USB devices was smaller than the existing "32GB" device.

A "Replace" did not work, giving this error. Like the reporter, I did a detach at the command line and tried an attach. This threw a traceback that included an text error that the replacement drive was not large enough.

Perhaps validate that replacement the size of replacement devices? I suspect you can reproduce this by trying to replace a boot module with a dramatically-smaller one.

#10 Updated by Dru Lavigne about 1 year ago

  • Assignee changed from Joe Maloney to William Grzybowski

#11 Updated by William Grzybowski about 1 year ago

  • Assignee changed from William Grzybowski to Joe Maloney

I want exact steps, not hear say.

The screenshot is clearly for detach operation, not replace as said.

#12 Updated by Michael Dexter about 1 year ago

15178

Exact steps based on reproduced on my lab system:

1. Insert device that is smaller than the current boot device
2. System: Boot: Status: Attach (goal is to mirror the boot device)
3. Select the new, smaller device
4. If "Use all disk space" is checked, the operation pops up a progress bar and quietly fails, leaving the boot device in its original status.
5. Using the default of not checking "Use all disk space", this traceback is generated:

Request Method: POST
Request URL: http://10.0.0.185/system/bootenv/pool/attach/?label=ada0p2
Software Version: FreeNAS-11.1-U2 (c636d1f4b)
Exception Type: MiddlewareError
Exception Value:
[MiddlewareError: [EFAULT] The device called VendorCo ProductCode (7.50 GB, 15728640 sectors does not have enough space to mirror the old device INTEL SSDSC2CW060A3 (55.90 GB, 117231408 sectors). Please use a larger device.]
Exception Location: ./freenasUI/system/forms.py in done, line 298
Server time: Wed, 14 Mar 2018 13:34:44 -0700

Environment:

Software Version: FreeNAS-11.1-U2 (c636d1f4b)
Request Method: POST
Request URL: http://10.0.0.185/system/bootenv/pool/attach/?label=ada0p2

Traceback:
File "./freenasUI/system/forms.py" in done
296. c.call('boot.attach', devname, {'expand': self.cleaned_data['expand']}, job=True)
File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py" in call
417. raise ClientException(job['error'], trace=job['exception'])
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
42. response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response
249. response = self._get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
178. response = middleware_method(request, callback, callback_args, callback_kwargs)
File "./freenasUI/freeadmin/middleware.py" in process_view
162. return login_required(view_func)(request, *view_args, **view_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
23. return view_func(request, *args, **kwargs)
File "./freenasUI/system/views.py" in bootenv_pool_attach
497. form.done()
File "./freenasUI/system/forms.py" in done
298. raise MiddlewareError(str(e))
File "./freenasUI/system/forms.py" in done
298. raise MiddlewareError(str(e))

Exception Type: MiddlewareError at /system/bootenv/pool/attach/
Exception Value: [MiddlewareError: [EFAULT] The device called VendorCo ProductCode (7.50 GB, 15728640 sectors does not have enough space to mirror the old device INTEL SSDSC2CW060A3 (55.90 GB, 117231408 sectors). Please use a larger device.]

Request information

GET

Variable Value
label 'ada0p2'
POST

Variable Value
all ''
attach_disk 'da4'
__form_id 'form_BootEnvPoolAttachForm'
FILES

No FILES data
COOKIES

Variable Value
csrftoken '********'
fntreeSaveStateCookie 'root%2Croot%2F68%2F69%2Croot%2F141'
sessionid 'tba08oa1ssoqpswa4zea7t6unusut3zs'
META

Variable Value

Observation: "Exception Value: [MiddlewareError: [EFAULT] The device called VendorCo ProductCode (7.50 GB, 15728640 sectors does not have enough space to mirror the old device INTEL SSDSC2CW060A3 (55.90 GB, 117231408 sectors). Please use a larger device.]" should probably be in a user-oriented dialog box regardless of if "Use all disk space" is checked or not.

#13 Updated by William Grzybowski about 1 year ago

Michael Dexter wrote:

Exact steps based on reproduced on my lab system:

1. Insert device that is smaller than the current boot device
2. System: Boot: Status: Attach (goal is to mirror the boot device)
3. Select the new, smaller device
4. If "Use all disk space" is checked, the operation pops up a progress bar and quietly fails, leaving the boot device in its original status.
5. Using the default of not checking "Use all disk space", this traceback is generated:

Request Method: POST
Request URL: http://10.0.0.185/system/bootenv/pool/attach/?label=ada0p2
Software Version: FreeNAS-11.1-U2 (c636d1f4b)
Exception Type: MiddlewareError
Exception Value:
[MiddlewareError: [EFAULT] The device called VendorCo ProductCode (7.50 GB, 15728640 sectors does not have enough space to mirror the old device INTEL SSDSC2CW060A3 (55.90 GB, 117231408 sectors). Please use a larger device.]
Exception Location: ./freenasUI/system/forms.py in done, line 298
Server time: Wed, 14 Mar 2018 13:34:44 -0700

Environment:

Software Version: FreeNAS-11.1-U2 (c636d1f4b)
Request Method: POST
Request URL: http://10.0.0.185/system/bootenv/pool/attach/?label=ada0p2

Traceback:
File "./freenasUI/system/forms.py" in done
296. c.call('boot.attach', devname, {'expand': self.cleaned_data['expand']}, job=True)
File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py" in call
417. raise ClientException(job['error'], trace=job['exception'])
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
42. response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response
249. response = self._get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
178. response = middleware_method(request, callback, callback_args, callback_kwargs)
File "./freenasUI/freeadmin/middleware.py" in process_view
162. return login_required(view_func)(request, *view_args, **view_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
23. return view_func(request, *args, **kwargs)
File "./freenasUI/system/views.py" in bootenv_pool_attach
497. form.done()
File "./freenasUI/system/forms.py" in done
298. raise MiddlewareError(str(e))
File "./freenasUI/system/forms.py" in done
298. raise MiddlewareError(str(e))

Exception Type: MiddlewareError at /system/bootenv/pool/attach/
Exception Value: [MiddlewareError: [EFAULT] The device called VendorCo ProductCode (7.50 GB, 15728640 sectors does not have enough space to mirror the old device INTEL SSDSC2CW060A3 (55.90 GB, 117231408 sectors). Please use a larger device.]

Request information

GET

Variable Value
label 'ada0p2'
POST

Variable Value
all ''
attach_disk 'da4'
__form_id 'form_BootEnvPoolAttachForm'
FILES

No FILES data
COOKIES

Variable Value
csrftoken '********'
fntreeSaveStateCookie 'root%2Croot%2F68%2F69%2Croot%2F141'
sessionid 'tba08oa1ssoqpswa4zea7t6unusut3zs'
META

Variable Value

Observation: "Exception Value: [MiddlewareError: [EFAULT] The device called VendorCo ProductCode (7.50 GB, 15728640 sectors does not have enough space to mirror the old device INTEL SSDSC2CW060A3 (55.90 GB, 117231408 sectors). Please use a larger device.]" should probably be in a user-oriented dialog box regardless of if "Use all disk space" is checked or not.

Right, do you notice this is a completely different error?

#14 Updated by Dru Lavigne about 1 year ago

  • Target version set to 11.2-RC2

#15 Updated by Joe Maloney about 1 year ago

16716

As shown in the screenshot I attempted to replace a 16GB boot device (da0p2) with a smaller 8GB boot device (da2). After pressing replace it gives the message that the disk replacement "Disk is being replaced". I am not getting a traceback here but If I come back a few minutes later nothing seems to have change with the pool when I do zpool status:


[root@freenas ~]# zpool status                                                  
  pool: freenas-boot                                                            
 state: ONLINE                                                                  
  scan: resilvered 834M in 0 days 00:00:44 with 0 errors on Tue Apr 17 09:14:14 
2018                                                                            
config:                                                                         

        NAME        STATE     READ WRITE CKSUM                                  
        freenas-boot  ONLINE       0     0     0                                
          mirror-0  ONLINE       0     0     0                                  
            da0p2   ONLINE       0     0     0                                  
            da1p2   ONLINE       0     0     0                                  

errors: No known data errors                                                    
[root@freenas ~]#  

I have a system setup with the issue reproduced. Ping me for credentials.

#16 Updated by Dru Lavigne about 1 year ago

  • Assignee changed from Joe Maloney to William Grzybowski

#17 Updated by Joe Maloney about 1 year ago

The latest nightly FreeNAS-11-MASTER-201804170410 still has the same issue replacing a disk. Nothing ever happens. I am not getting the traceback error when detaching, or attaching a disk.

#18 Updated by Joe Maloney about 1 year ago

When detaching one of the 16GB disks. Then attaching an 8GB disk I get the traceback:

MiddlewareError: [EFAULT] Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/zfs.py", line 128, in extend
    i['target'].attach(newvdev)
  File "libzfs.pyx", line 370, in libzfs.ZFS.__exit__
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/zfs.py", line 128, in extend
    i['target'].attach(newvdev)
  File "libzfs.pyx", line 1292, in libzfs.ZFSVdev.attach
libzfs.ZFSException: device is too small

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/middlewared/job.py", line 325, in run
    await self.future
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/middlewared/job.py", line 353, in __run_body
    rv = await self.middleware.run_in_thread(self.method, *([self] + args))
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 875, in run_in_thread
    return await self.run_in_executor(self.__threadpool, method, *args, **kwargs)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 872, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/middlewared/schema.py", line 606, in nf
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/zfs.py", line 131, in extend
    raise CallError(str(e), e.code)
middlewared.service_exception.CallError: [EUNKNOWN] device is too small
]

I would say there are 2 separate issues here. Attaching a smaller disk to a mirror throws a traceback, and disk replacement does not work.

#19 Updated by I m about 1 year ago

Tino Zidore wrote:

I figured it out.
I detached the drive though the shell and was able to attach the new drive through the web gui afterwards, and everything seems to be back to normal.

Except that it might be a web gui bug that I wasn't able to detach through web gui on a system disk.

I am having the same issue. Could you please share how you were able to detach the bad drive from Shell? What's the command? Thank you.

#20 Updated by I m about 1 year ago

I m wrote:

Tino Zidore wrote:

I figured it out.
I detached the drive though the shell and was able to attach the new drive through the web gui afterwards, and everything seems to be back to normal.

Except that it might be a web gui bug that I wasn't able to detach through web gui on a system disk.

I am having the same issue. Could you please share how you were able to detach the bad drive from Shell? What's the command? Thank you.

I did some research and found out how to detach the bad disk. I used the original disk name. Thank you.

#21 Updated by William Grzybowski 12 months ago

  • Severity set to Medium

#22 Updated by William Grzybowski 11 months ago

  • Status changed from Not Started to Ready for Testing
  • Target version changed from 11.2-RC2 to 11.2-BETA1
  • Needs Doc changed from Yes to No
  • Needs Merging changed from Yes to No

#23 Updated by Dru Lavigne 11 months ago

  • Subject changed from disk replacement on freenas-boot to Show errors on boot pool attach or replace

#24 Updated by Dru Lavigne 11 months ago

  • File deleted (midclt call.txt)

#25 Updated by Dru Lavigne 11 months ago

  • File deleted (middlewared.log)

#26 Avatar?id=55038&size=24x24 Updated by Zackary Welch 10 months ago

I am currently working to see if I can reproduce this.

#27 Avatar?id=55038&size=24x24 Updated by Zackary Welch 10 months ago

  • Status changed from Ready for Testing to Passed Testing
  • Needs QA changed from Yes to No

Clean error in the legacy UI, so this passes testing.

#28 Updated by Dru Lavigne 10 months ago

  • Status changed from Passed Testing to Done

Also available in: Atom PDF