Project

General

Profile

Bug #27384

Update instructions on how to replace disk in encrypted pool

Added by Dru Lavigne almost 3 years ago. Updated almost 2 years ago.

Status:
Done
Priority:
Important
Assignee:
Aaron St. John
Category:
Documentation
Target version:
Seen in:
Severity:
Low
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

SUPERMICRO X10DRI-T
Dual Intel Xeon E5-2620v4, 8-core/16-thread @2.1GHz
128GB DDR4 ECC RAM
12 * HGST 6TB SATA
LSI 9300-8i SAS-3 HBA (IT Mode)
FreeNAS 11.0-U4 installed on dual SATA-DOM 32GB. (ZFS Mirror)

ChangeLog Required:
No

Description

Summary:
Unclear if it's safe to proceed with replacing faulted disk in an encrypted pool.

Steps to Reproduce:
I have a disk that shows read errors and has been marked as FAULTED in the zpool.

The documentation on [u][url=http://doc.freenas.org/11/storage.html#replacing-an-encrypted-drive]8.1.10.1. Replacing an Encrypted Drive[/url][/u] explicitly states:
[QUOTE]First, make sure that a passphrase has been set using the instructions in Encryption before attempting to replace the failed drive. Then, follow the steps 1 and 2 as described above.[/QUOTE]

The volume does not have a passphrase set, since I want it to be available automatically after a system restart.

When trying to set a passphrase for the volume I do get a middleware error:

Environment:

Software Version: FreeNAS-11.0-U4 (54848d13b)
Request Method: POST
Request URL: https://127.0.0.1:443/storage/volume/1/create_passphrase/

Traceback:
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
  39.             response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response
  249.             response = self._get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
  178.             response = middleware_method(request, callback, callback_args, callback_kwargs)
File "./freenasUI/freeadmin/middleware.py" in process_view
  162.         return login_required(view_func)(request, *view_args, **view_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
  23.                 return view_func(request, *args, **kwargs)
File "./freenasUI/storage/views.py" in volume_create_passphrase
  1116.             form.done(volume=volume)
File "./freenasUI/storage/forms.py" in done
  2458.         notifier().geli_passphrase(volume, passfile, rmrecovery=True)
File "./freenasUI/middleware/notifier.py" in geli_passphrase
  519.                 self.__geli_delkey(dev, GELI_RECOVERY_SLOT, force=True)
File "./freenasUI/middleware/notifier.py" in __geli_delkey
  494.             raise MiddlewareError("Unable to delete key %s on %s: %s" % (slot, dev, err))

Exception Type: MiddlewareError at /storage/volume/1/create_passphrase/
Exception Value: [MiddlewareError: b'Unable to delete key 1 on gptid/13d14fba-3ef8-11e7-b9b9-0cc47adf0494: geli: Cannot read metadata from gptid/13d14fba-3ef8-11e7-b9b9-0cc47adf0494 (error=6).\n']

So, due to the faulted drive, it's not possible to follow the documentation and actually set a passphrase for the volume.

The documentation continues:
[QUOTE]During step 3, you will be prompted to input and confirm the passphrase for the pool. Enter this information then click the Replace Disk button. Wait until the resilvering is complete.[/QUOTE]

Since I seem to be requiring to enter the passphrase that I cannot set anymore, it's unclear whether it is safe to continue with the drive replacement procedure as outlined in [u][url=http://doc.freenas.org/11/storage.html#replacing-a-failed-drive]8.1.10. Replacing a Failed Drive[/url][/u].

Given that it would be impossible to set a passphrase on a drive that physically doesn't answer on the bus anymore I'd guess that it [i]should[/i] be safe. Given the error message, I'd also expect this to be a known and expected situation which causes me to not be so sure about this anymore.

Expected Results:
The documentation requires steps that are likely to fail or be impossible to follow when an actual drive failure occurs. The GUI should not throw exceptions in that case but correctly handle the expected error case.
The documentation should explicitly state if it's safe to proceed and what to expect when skipping the step.

Actual Results:
Following the documenation if made impossible by an error thrown by Middleware.
Documentation is missing information on how to proceed when setting a passphrase on the volume is, as expected, not possible anymore.

Regression:
n/a

Notes:
I haven't found any other issues that explicitly match this problem.


Related issues

Copied from FreeNAS - Bug #27381: Unclear if it's safe to proceed with replacing faulted disk in an encrypted pool.Closed: Cannot reproduce2017-12-22

Associated revisions

Revision 1b6cdfe9 (diff)
Added by Vladimir Vinogradenko over 2 years ago

fix(boot): Turn boot.attach into a job Ticket: #27384

Revision e9d4ff96 (diff)
Added by Vladimir Vinogradenko over 2 years ago

fix(boot): Turn boot.attach into a job Ticket: #27384

Revision b3c38b10 (diff)
Added by Vladimir Vinogradenko over 2 years ago

fix(boot): Turn boot.attach into a job Ticket: #27384

Revision 33e48a3c (diff)
Added by Vladimir Vinogradenko over 2 years ago

fix(boot): Turn boot.attach into a job Ticket: #27384

History

#1 Updated by Dru Lavigne almost 3 years ago

  • Copied from Bug #27381: Unclear if it's safe to proceed with replacing faulted disk in an encrypted pool. added

#2 Updated by Dru Lavigne almost 3 years ago

Notes that the section on encryption also needs a review.

#3 Updated by Warren Block over 2 years ago

  • Status changed from Unscreened to Screened

#4 Updated by Dru Lavigne over 2 years ago

  • Status changed from Screened to Not Started
  • Target version changed from 11.2-BETA1 to 11.2-RC2

#5 Updated by Dru Lavigne over 2 years ago

  • Severity set to Low

#6 Updated by Dru Lavigne about 2 years ago

  • Target version changed from 11.2-RC2 to 11.2-BETA3

#7 Updated by Dru Lavigne about 2 years ago

  • Target version changed from 11.2-BETA3 to Backlog

#8 Updated by Warren Block about 2 years ago

  • Assignee changed from Warren Block to Aaron St. John

#9 Updated by Dru Lavigne about 2 years ago

  • Target version changed from Backlog to 11.2-RC1

#10 Updated by Aaron St. John about 2 years ago

  • Status changed from Not Started to Ready for Testing
  • Needs Doc changed from Yes to No

After speaking with Warren and William, I deleted the steps William finds to be unnecessary. That being said, Warren suggested that we be 100% sure this is the behaviour of replacing an encrypted disk. Thus, marking "Ready for Testing".

(docs) master branch PR: https://github.com/freenas/freenas-docs/pull/345
angulargui PR: https://github.com/freenas/freenas-docs/pull/380

#11 Updated by Dru Lavigne about 2 years ago

  • Status changed from Ready for Testing to In Progress

#13 Updated by Dru Lavigne almost 2 years ago

  • Status changed from In Progress to Ready for Testing
  • Needs Merging changed from Yes to No

#14 Updated by Timothy Moore II almost 2 years ago

  • Status changed from Ready for Testing to Passed Testing
  • Needs QA changed from Yes to No

Testing with a FreeNAS system installed with INTERNAL24:

Log in to legacy UI and confirm text changes are present in the embedded guide.
Log in to the new UI and confirm text changes are present in the embedded guide.

#15 Updated by Dru Lavigne almost 2 years ago

  • Status changed from Passed Testing to Done

Also available in: Atom PDF