Add support for NVMe device replacement
NVMe needs support for both hot-plug and un-plug.
For hot-plug there are two potential issues: a) make PCI report that something happened, and b) hope there is enough resources reserved to allocate from (which may be difficult).
For un-plug obviously clean teardown is needed, and one of the problem is that device is no longer responding to accesses, since it is no longer there.
- Status changed from Screened to In Progress
- Target version changed from Backlog to 11.2-U3
- Parent task deleted (
I've made few fixes there, including r343447 already in 11.3-stable branch. Unfortunately resource allocation problem on plug-in is complicated and may still require reboot, number of others should be handled now.
#4 Updated by Alexander Motin about 2 months ago
- Status changed from In Progress to Ready for Testing
I've merged to 11.2-stable change that should allow hot NVMe device replacement, at least when it is disabled with `devctl disable nvmeX` before removal, even under load.
QE: Minimal test doable on any NVMe hardware include `devctl disable nvmeX`/`devctl enable nvmeX` under load (make sure to not upset ZFS removing critical/only vdev). Maximal test would include real hot NVMe device replacement on M50 platform (with explicit `devctl disable` first). I haven't tested that after these changes, but there is a chance it may work now, since resources freed by removed device should be enough for the inserted one.
#5 Updated by Dru Lavigne about 2 months ago
- Subject changed from NVMe hot-plug and un-plug to Add support for NVMe device replacement
- Needs Doc changed from Yes to No
- Needs Merging changed from Yes to No
11.2 commits: https://github.com/freenas/os/commit/2c2d6d9915fc0a4f9b74540ec2c208abb92e79b5