Add support for NVMe device replacement
NVMe needs support for both hot-plug and un-plug.
For hot-plug there are two potential issues: a) make PCI report that something happened, and b) hope there is enough resources reserved to allocate from (which may be difficult).
For un-plug obviously clean teardown is needed, and one of the problem is that device is no longer responding to accesses, since it is no longer there.
MFC r342557, r342559: Reimplement nvd(4) detach handling.
Previous code typically crashed in case of NVMe device unplug or even clean
detach while some I/Os are still in flight. To fix this the new code calls
disk_gone() and waits for confirmation of all references gone before calling
disk_destroy(), freeing other resources and allowing controller detach.
While there, fix disk lists locking and reimplement unit numbers assignment.
(cherry picked from commit bfb1323a075dc3535d422570b54cca62c5a54ffb)
- Status changed from Screened to In Progress
- Target version changed from Backlog to 11.2-U3
- Parent task deleted (
I've made few fixes there, including r343447 already in 11.3-stable branch. Unfortunately resource allocation problem on plug-in is complicated and may still require reboot, number of others should be handled now.
- Status changed from In Progress to Ready for Testing
I've merged to 11.2-stable change that should allow hot NVMe device replacement, at least when it is disabled with `devctl disable nvmeX` before removal, even under load.
QE: Minimal test doable on any NVMe hardware include `devctl disable nvmeX`/`devctl enable nvmeX` under load (make sure to not upset ZFS removing critical/only vdev). Maximal test would include real hot NVMe device replacement on M50 platform (with explicit `devctl disable` first). I haven't tested that after these changes, but there is a chance it may work now, since resources freed by removed device should be enough for the inserted one.
- Subject changed from NVMe hot-plug and un-plug to Add support for NVMe device replacement
- Needs Doc changed from Yes to No
- Needs Merging changed from Yes to No
11.2 commits: https://github.com/freenas/os/commit/2c2d6d9915fc0a4f9b74540ec2c208abb92e79b5