Project

General

Profile

Feature #18428

Speed up snapshots listing

Added by test test about 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Nice to have
Assignee:
William Grzybowski
Category:
OS
Target version:
Estimated time:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

Description

Due to

/sbin/zfs list -t snapshot -H -o name

being used instead of

/sbin/zfs list -t snapshot -H -o name -s name

it takes more than five times as long to complete.

I don't know if the script autosnap.py or others that are involved depend on the order given without -s name or if sorting after name is sufficient.

Could someone please have a look? We are talking about improvements like 5s vs 0.7s.

Thanks!

busy time.png (12.1 KB) busy time.png Bill T, 04/24/2017 06:59 PM
10824

Associated revisions

Revision 08c3658d (diff)
Added by William Grzybowski about 2 years ago

feat(snap): sort by name

Ticket: #18428

Revision 1d78ea76 (diff)
Added by William Grzybowski about 2 years ago

Revert "feat(snap): sort by name"

This reverts commit 08c3658d0996387dce90ae2b6e8086dfa4a11b7c.

Ticket: #18428
(cherry picked from commit ff7c3c70560b973ced8d3147b1d8a572c2413063)

Revision 351f5a81 (diff)
Added by William Grzybowski about 2 years ago

Revert "feat(snap): sort by name"

This reverts commit 08c3658d0996387dce90ae2b6e8086dfa4a11b7c.

Ticket: #18428
(cherry picked from commit ff7c3c70560b973ced8d3147b1d8a572c2413063)

Revision 42ea2db1 (diff)
Added by William Grzybowski about 2 years ago

Revert "feat(snap): sort by name"

This reverts commit 08c3658d0996387dce90ae2b6e8086dfa4a11b7c.

Ticket: #18428
(cherry picked from commit ff7c3c70560b973ced8d3147b1d8a572c2413063)

Revision 584345ad (diff)
Added by William Grzybowski about 2 years ago

Revert "feat(snap): sort by name"

This reverts commit 08c3658d0996387dce90ae2b6e8086dfa4a11b7c.

Ticket: #18428

Revision 46c49403 (diff)
Added by William Grzybowski over 1 year ago

fix(snap): use zfs list -s name to speed up listing

Ticket: #18428

Revision a5d8cc66 (diff)
Added by William Grzybowski over 1 year ago

fix(snap): use zfs list -s name to speed up listing

Ticket: #18428

Revision 5d16dfdf (diff)
Added by Dru Lavigne about 1 year ago

Mention ZFS improvements.
Ticket: #13630
Ticket: #18428

History

#1 Updated by Bonnie Follweiler about 2 years ago

  • Assignee set to Kris Moore

#2 Avatar?id=14398&size=24x24 Updated by Kris Moore about 2 years ago

  • Assignee changed from Kris Moore to William Grzybowski
  • Priority changed from No priority to Nice to have
  • Target version set to 9.10.2

#3 Updated by William Grzybowski about 2 years ago

  • Status changed from Unscreened to Screened

For autosnap I dont foresee a problem, it would probably be for autorepl only.

#4 Updated by William Grzybowski about 2 years ago

  • Status changed from Screened to 19

#5 Updated by William Grzybowski about 2 years ago

  • Status changed from 19 to Needs Developer Review

#6 Updated by William Grzybowski about 2 years ago

  • Assignee changed from William Grzybowski to Josh Paetzel

#7 Updated by Josh Paetzel about 2 years ago

  • Status changed from Needs Developer Review to Reviewed
  • Assignee changed from Josh Paetzel to William Grzybowski

#8 Updated by William Grzybowski about 2 years ago

  • Status changed from Reviewed to Closed: Not To Be Fixed

Turns out this mucks with the deletion order of snapshots, which causes zfs destroy error (because recursive delete of parent)

#9 Updated by Bill T over 1 year ago

10824

It isn't just the runtime of zfs list that is improved with this change, on my raid-z with 3 low RPM WD Red drives, the busy time dropped from ~20% (week 13-14 in the attached graph) down to below 5% (week 15 onward).

The huge spike in week 15 is when I ran a scrub, which dropped from ~36 hours to ~14 hours.

I did this by changing autosnap.py on freenas-9.10.2-U2 to:

272: zfsproc = pipeopen("/sbin/zfs list -t snapshot -H -o name -s name", debug, logger=log)

That's a lot of busy time to be reclaimed if the delete case can be addressed

#10 Updated by William Grzybowski over 1 year ago

Bill T wrote:

It isn't just the runtime of zfs list that is improved with this change, on my raid-z with 3 low RPM WD Red drives, the busy time dropped from ~20% (week 13-14 in the attached graph) down to below 5% (week 15 onward).

The huge spike in week 15 is when I ran a scrub, which dropped from ~36 hours to ~14 hours.

I did this by changing autosnap.py on freenas-9.10.2-U2 to:

272: zfsproc = pipeopen("/sbin/zfs list -t snapshot -H -o name -s name", debug, logger=log)

That's a lot of busy time to be reclaimed if the delete case can be addressed

Yes, however the replication currently relies on that, so nothing can be done at this time regarding that.

#11 Updated by Bill T over 1 year ago

Is the only problem deleting parents before children? If yes, then we can make this faster and not break recursive delete by sorting in python after calling zfs list, i.e.:

zfsproc = pipeopen("/sbin/zfs list -t snapshot -H -o name -s name", debug, logger=log)
lines = zfsproc.communicate()[0].split('\n')
# Sort snapshots, deepest filesystem first.
lines = sorted(lines, cmp=lambda a, b: b.count('/')-a.count('/'))

#12 Updated by William Grzybowski over 1 year ago

  • Status changed from Closed: Not To Be Fixed to Screened
  • Target version changed from 9.10.2 to 11.1

I am unsure about that, i think there is more in the play here, e.g. txg transaction and the sorts.
Also its possible doing the sorting in python is gonna be slower than the built-in version in C.

#13 Updated by Bill T over 1 year ago

I'm fairly certain the cost of doing the sort in python will be negligible compared to the system time spent in 'zfs list' when not using '-s name'. On my system:

$ time zfs list -t snapshot -H -o name 2>&1 | grep total
zfs list -t snapshot -H -o name 2>&1  1.09s user 2.62s system 99% cpu 3.716 total

$ time zfs list -t snapshot -H -o name -s name 2>&1 | grep total
zfs list -t snapshot -H -o name -s name 2>&1  0.07s user 0.28s system 99% cpu 0.350 total

That's more than an order of magnitude slower when using the existing method. To see how fast python sorts:

$ zfs list -t snapshot -H -o name -s name > /tmp/snapshots
$ wc -l /tmp/snapshots
    7732 /tmp/snapshots
$ python -m timeit "lines = open('/tmp/snapshots').readlines()" "sorted(lines, cmp=lambda a, b: b.count('/')-a.count('/'))" 
100 loops, best of 3: 17.2 msec per loop

And more importantly, for a storage focused platform, you're not consuming nearly as much disk iops.

Is there some unit / integration test I could run to validate my proposed change? I'd be willing to make the sort as fancy as necessary to get whatever failed in comment #8 resolved.

#14 Updated by William Grzybowski over 1 year ago

Bill T wrote:

I'm fairly certain the cost of doing the sort in python will be negligible compared to the system time spent in 'zfs list' when not using '-s name'. On my system:

[...]

That's more than an order of magnitude slower when using the existing method. To see how fast python sorts:

[...]

And more importantly, for a storage focused platform, you're not consuming nearly as much disk iops.

Totally agree there but since last experiment failed I am skeptical about rushing in trying it again. Its been long enough again I don't remember exact details it failed, our fault in lacking documentation of the matter.

Is there some unit / integration test I could run to validate my proposed change? I'd be willing to make the sort as fancy as necessary to get whatever failed in comment #8 resolved.

Unfortunately not, QA team is working on integration test for this feature but this is still all manual.

#15 Updated by William Grzybowski over 1 year ago

  • Status changed from Screened to Ready For Release

#16 Updated by Dru Lavigne over 1 year ago

  • Subject changed from Make autosnap.py run faster to Speed up snapshots listing

#17 Updated by Dru Lavigne about 1 year ago

  • Target version changed from 11.1 to 11.1-BETA1

#18 Updated by Dru Lavigne about 1 year ago

  • Status changed from Ready For Release to Resolved

#19 Updated by Bonnie Follweiler about 1 year ago

  • Needs QA changed from Yes to No
  • QA Status Test Passes FreeNAS added
  • QA Status deleted (Not Tested)

Sorting nearly 2000 snapshots takes a few seconds

Also available in: Atom PDF