Project

General

Profile

Bug #27381

Unclear if it's safe to proceed with replacing faulted disk in an encrypted pool.

Added by Mac Lemon over 2 years ago. Updated over 2 years ago.

Status:
Closed: Cannot reproduce
Priority:
No priority
Assignee:
William Grzybowski
Category:
Middleware
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

SUPERMICRO X10DRI-T
Dual Intel Xeon E5-2620v4, 8-core/16-thread @2.1GHz
128GB DDR4 ECC RAM
12 * HGST 6TB SATA
LSI 9300-8i SAS-3 HBA (IT Mode)
FreeNAS 11.0-U4 installed on dual SATA-DOM 32GB. (ZFS Mirror)

ChangeLog Required:
No

Description

Summary:
Unclear if it's safe to proceed with replacing faulted disk in an encrypted pool.

Steps to Reproduce:
I have a disk that shows read errors and has been marked as FAULTED in the zpool.

The documentation on [u][url=http://doc.freenas.org/11/storage.html#replacing-an-encrypted-drive]8.1.10.1. Replacing an Encrypted Drive[/url][/u] explicitly states:
[QUOTE]First, make sure that a passphrase has been set using the instructions in Encryption before attempting to replace the failed drive. Then, follow the steps 1 and 2 as described above.[/QUOTE]

The volume does not have a passphrase set, since I want it to be available automatically after a system restart.

When trying to set a passphrase for the volume I do get a middleware error:

Environment:

Software Version: FreeNAS-11.0-U4 (54848d13b)
Request Method: POST
Request URL: https://127.0.0.1:443/storage/volume/1/create_passphrase/

Traceback:
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
  39.             response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response
  249.             response = self._get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
  178.             response = middleware_method(request, callback, callback_args, callback_kwargs)
File "./freenasUI/freeadmin/middleware.py" in process_view
  162.         return login_required(view_func)(request, *view_args, **view_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
  23.                 return view_func(request, *args, **kwargs)
File "./freenasUI/storage/views.py" in volume_create_passphrase
  1116.             form.done(volume=volume)
File "./freenasUI/storage/forms.py" in done
  2458.         notifier().geli_passphrase(volume, passfile, rmrecovery=True)
File "./freenasUI/middleware/notifier.py" in geli_passphrase
  519.                 self.__geli_delkey(dev, GELI_RECOVERY_SLOT, force=True)
File "./freenasUI/middleware/notifier.py" in __geli_delkey
  494.             raise MiddlewareError("Unable to delete key %s on %s: %s" % (slot, dev, err))

Exception Type: MiddlewareError at /storage/volume/1/create_passphrase/
Exception Value: [MiddlewareError: b'Unable to delete key 1 on gptid/13d14fba-3ef8-11e7-b9b9-0cc47adf0494: geli: Cannot read metadata from gptid/13d14fba-3ef8-11e7-b9b9-0cc47adf0494 (error=6).\n']

So, due to the faulted drive, it's not possible to follow the documentation and actually set a passphrase for the volume.

The documentation continues:
[QUOTE]During step 3, you will be prompted to input and confirm the passphrase for the pool. Enter this information then click the Replace Disk button. Wait until the resilvering is complete.[/QUOTE]

Since I seem to be requiring to enter the passphrase that I cannot set anymore, it's unclear whether it is safe to continue with the drive replacement procedure as outlined in [u][url=http://doc.freenas.org/11/storage.html#replacing-a-failed-drive]8.1.10. Replacing a Failed Drive[/url][/u].

Given that it would be impossible to set a passphrase on a drive that physically doesn't answer on the bus anymore I'd guess that it [i]should[/i] be safe. Given the error message, I'd also expect this to be a known and expected situation which causes me to not be so sure about this anymore.

Expected Results:
The documentation requires steps that are likely to fail or be impossible to follow when an actual drive failure occurs. The GUI should not throw exceptions in that case but correctly handle the expected error case.
The documentation should explicitly state if it's safe to proceed and what to expect when skipping the step.

Actual Results:
Following the documenation if made impossible by an error thrown by Middleware.
Documentation is missing information on how to proceed when setting a passphrase on the volume is, as expected, not possible anymore.

Regression:
n/a

Notes:
I haven't found any other issues that explicitly match this problem.


Related issues

Copied to FreeNAS - Bug #27384: Update instructions on how to replace disk in encrypted poolDone

History

#1 Updated by Dru Lavigne over 2 years ago

  • Assignee changed from Release Council to William Grzybowski
  • Target version set to 11.2-BETA1

William: please load balance wrt the traceback.

#2 Updated by William Grzybowski over 2 years ago

  • Assignee changed from William Grzybowski to Dru Lavigne

Dru, documentation states a passphrase should be set, that information is wrong. I see no reason one would need to make that operation. Should this ticket be split?

#3 Updated by Dru Lavigne over 2 years ago

Yes, there should be a separate ticket for the doc edits. We'll also need a brain dump from someone who knows how the encryption process works as the last time we tried to update that section noone knew.

#4 Updated by Dru Lavigne over 2 years ago

  • Assignee changed from Dru Lavigne to William Grzybowski

#5 Updated by William Grzybowski over 2 years ago

  • Status changed from Unscreened to Screened

Can you create that ticket!?

#6 Updated by Dru Lavigne over 2 years ago

  • Copied to Bug #27384: Update instructions on how to replace disk in encrypted pool added

#7 Updated by William Grzybowski over 2 years ago

Thanks!

#8 Updated by William Grzybowski over 2 years ago

  • Status changed from Screened to 15

Did you fix/replace the disk yet?

I removed a disk and tried to create a passphrase and it worked (even though you shouldnt need to do that). I wonder what is different in your setup.

#9 Updated by Dru Lavigne over 2 years ago

  • Status changed from 15 to Closed: Cannot reproduce
  • Target version changed from 11.2-BETA1 to N/A

Closing out.

Also available in: Atom PDF