Project

General

Profile

Bug #79860

Merge some ZFS-on-SSD performance optimizations from FreeBSD

Added by Alexander Motin over 2 years ago. Updated over 2 years ago.

Status:
Ready for Testing
Priority:
No priority
Assignee:
Alexander Motin
Category:
OS
Target version:
Seen in:
Severity:
Low
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Sequential scrub increased I/O aggregation size in ZFS from 128KB to 1MB. It makes sense for HDDs, but for SSDs it is mostly a waste of time on additional memory copy operation to aggregate memory chunk, which GEOM will still split into 128KB pieces. Proposed patch introduces new limit of 128KB for non-rotating VDEVs and allows it to be tuned even lower later, if we see it beneficial.

Make vfs.zfs.metaslab_lba_weighting_enabled to be automatically ignored for SSD-based vdevs, since it has no sense there.

Associated revisions

Revision 2af932d3 (diff)
Added by Alexander Motin over 2 years ago

MFV/ZoL: Fix zfs_vdev_aggregation_limit bounds checking Update the bounds checking for zfs_vdev_aggregation_limit so that it has a floor of zero and a maximum value of the supported block size for the pool. Additionally add an early return when zfs_vdev_aggregation_limit equals zero to disable aggregation. For very fast solid state or memory devices it may be more expensive to perform the aggregation than to issue the IO immediately. Author: Brian Behlendorf <behlendorf1@llnl.gov> zfsonlinux/zfs@a58df6f53687ac6d1dee21f60de41b2552a43201 MFV/ZoL: Cap maximum aggregate IO size Commit 8542ef8 allowed optional IOs to be aggregated beyond the specified aggregation limit. Since the aggregation limit was also used to enforce the maximum block size, setting `zfs_vdev_aggregation_limit=16777216` could result in an attempt to allocate an ABD larger than 16M. Author: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6259 Closes #6270 zfsonlinux/zfs@2d678f779aba26a93314c8ee1142c3985fa25cb6 Ticket: #79860 (cherry picked from commit ac6361987efa85abf7e2a615747c0f62d065a57b)

Revision 202607d0 (diff)
Added by Alexander Motin over 2 years ago

Add separate aggregation limit for non-rotating media. Before sequential scrub patches ZFS never aggregated I/Os above 128KB. Sequential scrub bumped that to 1MB, which motivation I understand for spinning disks, since it should reduce number of head seeks. But for SSDs it makes much less sense to me, especially on FreeBSD, where due to MAXPHYS limitation device will likely still see bunch of 128KB I/Os instead of one large. Having more strict aggregation limit allows to avoid allocation of large memory buffer and memcpy to/from it, that is a serious problem when bandwidth reaches few GB/s. MFC after: 1 month Sponsored by: iXsystems, Inc. Ticket: #79860 (cherry picked from commit a8953beba847c9bb4b9e6e40cc7d588d66889a3c)

Revision c5f22b94 (diff)
Added by Alexander Motin over 2 years ago

MFV/ZoL: Disable LBA weighting on files and SSDs The LBA weighting makes sense on rotational media where the outer tracks have twice the bandwidth of the inner tracks. However, it is detrimental on nonrotational media such as solid state disks, where the only effect is to ensure that metaslabs enter the best-fit allocation behavior sooner, which is detrimental to performance. It also makes no sense on files where the underlying filesystem can arrange things however it wants. Author: Richard Yao <ryao@gentoo.org> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3712 zfsonlinux/zfs@fb40095f5f0853946f8150481ca22602d1334dfe To reduce code divergence this merge replaces equivalent but different FreeBSD code detecting non-rotating medium vdevs. MFC after: 1 month Ticket: #79860 (cherry picked from commit 3f2d6e8d7e0cda9a284316123f8c09204022227e)

Revision cfcdfa49 (diff)
Added by Alexander Motin over 2 years ago

MFV/ZoL: Fix zfs_vdev_aggregation_limit bounds checking Update the bounds checking for zfs_vdev_aggregation_limit so that it has a floor of zero and a maximum value of the supported block size for the pool. Additionally add an early return when zfs_vdev_aggregation_limit equals zero to disable aggregation. For very fast solid state or memory devices it may be more expensive to perform the aggregation than to issue the IO immediately. Author: Brian Behlendorf <behlendorf1@llnl.gov> zfsonlinux/zfs@a58df6f53687ac6d1dee21f60de41b2552a43201 MFV/ZoL: Cap maximum aggregate IO size Commit 8542ef8 allowed optional IOs to be aggregated beyond the specified aggregation limit. Since the aggregation limit was also used to enforce the maximum block size, setting `zfs_vdev_aggregation_limit=16777216` could result in an attempt to allocate an ABD larger than 16M. Author: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6259 Closes #6270 zfsonlinux/zfs@2d678f779aba26a93314c8ee1142c3985fa25cb6 Ticket: #79860 (cherry picked from commit ac6361987efa85abf7e2a615747c0f62d065a57b) (cherry picked from commit 2af932d35f4e8f59abd25bee39ac63ec0613b6b8)

Revision 3b0fefa5 (diff)
Added by Alexander Motin over 2 years ago

Add separate aggregation limit for non-rotating media. Before sequential scrub patches ZFS never aggregated I/Os above 128KB. Sequential scrub bumped that to 1MB, which motivation I understand for spinning disks, since it should reduce number of head seeks. But for SSDs it makes much less sense to me, especially on FreeBSD, where due to MAXPHYS limitation device will likely still see bunch of 128KB I/Os instead of one large. Having more strict aggregation limit allows to avoid allocation of large memory buffer and memcpy to/from it, that is a serious problem when bandwidth reaches few GB/s. MFC after: 1 month Sponsored by: iXsystems, Inc. Ticket: #79860 (cherry picked from commit a8953beba847c9bb4b9e6e40cc7d588d66889a3c) (cherry picked from commit 202607d0feb2c700fde741a2f5a03ba057315428)

Revision 1be164f7 (diff)
Added by Alexander Motin over 2 years ago

MFV/ZoL: Disable LBA weighting on files and SSDs The LBA weighting makes sense on rotational media where the outer tracks have twice the bandwidth of the inner tracks. However, it is detrimental on nonrotational media such as solid state disks, where the only effect is to ensure that metaslabs enter the best-fit allocation behavior sooner, which is detrimental to performance. It also makes no sense on files where the underlying filesystem can arrange things however it wants. Author: Richard Yao <ryao@gentoo.org> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3712 zfsonlinux/zfs@fb40095f5f0853946f8150481ca22602d1334dfe To reduce code divergence this merge replaces equivalent but different FreeBSD code detecting non-rotating medium vdevs. MFC after: 1 month Ticket: #79860 (cherry picked from commit 3f2d6e8d7e0cda9a284316123f8c09204022227e) (cherry picked from commit c5f22b9417414e6620cbe7bd87aaa0f115b9e7d1)

History

#1 Updated by Alexander Motin over 2 years ago

#2 Updated by Dru Lavigne over 2 years ago

  • Subject changed from Merge few ZFS-on-SSD performance optimization from FreeBSD to Merge some ZFS-on-SSD performance optimizations from FreeBSD
  • Target version changed from 11.2-U4 to 11.2-U3

#3 Updated by Alexander Motin over 2 years ago

  • Status changed from In Progress to Ready for Testing

PR for 11.2-stable: https://github.com/freenas/os/pull/189

QE: There should be no externally visible changes. Internally, profiler should show disappearance of one big memcpy() when writing to zvol/dataet with block size >= 128K on SSD-backed pool. I hope nap@ to test it.

#4 Updated by Dru Lavigne over 2 years ago

  • Needs Doc changed from Yes to No
  • Needs Merging changed from Yes to No

Also available in: Atom PDF