Project

General

Profile

Bug #25531

Add scrub sorting feature

Added by Sean Fagan about 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
Blocks Until Resolved
Assignee:
Sean Fagan
Category:
OS
Target version:
Seen in:
Sprint:
Severity:
New
Backlog Priority:
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

I'm cheating, this is already in the branch sef-zfs-sorted-scrub, but needs review, tracking with the ZoL code (my changes are up to date so far), and possibly pushing to freebsd.

I've tested this on two real-world systems: a FN box at the office, and a MiniNAS at home. The FN box at the office had significant improvement -- from about 4h to about 2h -- while the system at home didn't have any improvement (and may have been a bit slower). The primary difference between them is that the system at home is almost entirely used as the receiving end of replication from another FN box; this means that most of the data would already have been written sequentially, meaning little to no gain (and, theoretically, possibly a slight penalty) by sorting the blocks in each txg as it's scrubbed.

This branch is dependent on the branch in #25530.


Related issues

Related to FreeNAS - Bug #25530: Add scrub pause & resumeResolved2017-08-10
Related to FreeNAS - Feature #25170: Add Resilver Priority to Storage tabResolved2017-07-17
Related to FreeNAS - Bug #26313: Merge in additional upstream scrub commitsResolved2017-10-20
Has duplicate FreeNAS - Feature #25687: Import sequential scrubs into our OSClosed: Duplicate2017-08-25

Associated revisions

Revision 91359bf7 (diff)
Added by Dru Lavigne 12 months ago

Mentioned improved scrubs.
Ticket: #25531

History

#1 Updated by Sean Fagan about 1 year ago

  • Has duplicate Feature #25687: Import sequential scrubs into our OS added

#2 Updated by Dru Lavigne about 1 year ago

  • Related to Bug #25530: Add scrub pause & resume added

#3 Updated by Alexander Motin about 1 year ago

I've rebased the patches on top of fresh freenas/11-stable branch and cleaned bunch of introduced space/tab divergence from upstream. There is still some more divergence left to look on. And the upstream still seems to be a moving target: they've rebased and merged their patches into one and added one more new, that needs a look and probably merge.

#4 Updated by Alexander Motin about 1 year ago

Forgot to mention, the result of my rebase/cleaning is at sef-zfs-sorted-scrub2 branch.

#5 Avatar?id=14398&size=24x24 Updated by Kris Moore about 1 year ago

  • Assignee changed from Alexander Motin to Sean Fagan

Sean, back over to you, please give it another run-through and ping back here if/when you have the branch to a state you are comfortable with.

Thanks!

#6 Updated by Sean Fagan about 1 year ago

Just for reference so I don't have to work to find it again: https://github.com/zfsonlinux/zfs/pull/6256/commits/127fcce9da5981d86acb78359ae1d90810419670 is the second set of changes. (Amusingly, I already made at least one of them.)

#7 Updated by Sean Fagan about 1 year ago

  • Status changed from Needs Developer Review to Fix In Progress

Only a couple of the patches were applicable. (Side note: why is our zpool_do_status() so different?) Building now to ensure I didn't have any errors.

#8 Updated by Sean Fagan about 1 year ago

  • Status changed from Fix In Progress to Needs Developer Review
  • Assignee changed from Sean Fagan to Alexander Motin

Back over to Alexander. I've got the patches integrated (in the sef-zfs-sorted-scrub2 branch), and have tested it.

Without it enabled:
<blockquote>scrub repaired 0 in 8h57m with 0 errors on Thu Oct 5 07:33:53 2017</blockquote>
With it enabled:
<blockquote>scrub repaired 0 in 5h44m with 0 errors on Wed Oct 4 19:27:04 2017</blockquote>

which is pretty nice.

#9 Updated by Alexander Motin about 1 year ago

Sean, I've made another pass comparing the original and your patches and fixed another set of divergences. Some of them were introduced just with your last commit. You should be more careful about spaces vs tabs and other formatting to not landmine further merges. You may wish some editor that highlights tabs in some way. I like mcedit for that feature, even though it is not exactly perfect there.

In process of review I found one lost code line in zpool_main.c, which I fixed, but also such code difference in dsl_scan.c (left is upstream):

│+               if (!scn->scn_suspending) {                       ││+               if (!scn->scn_suspending) {                       │
│+                       ASSERT0(avl_numnodes(&scn->scn_queue));   ││                                                                  │
│+                       scn->scn_done_txg = tx->tx_txg + 1;       ││+                       scn->scn_done_txg = tx->tx_txg + 1;       │
│+                       if (scn->scn_is_sorted) {                 ││+                       scn->scn_checkpointing = B_TRUE;          │
│+                               scn->scn_checkpointing = B_TRUE;  ││+                       scn->scn_clearing = B_TRUE;               │
│+                               scn->scn_clearing = B_TRUE;       ││+                       zfs_dbgmsg("scan complete txg %llu", tx->t│
│+                       }                                         ││                                                                  │
│+                       zfs_dbgmsg("scan complete txg %llu",      ││                                                                  │
│+                           (longlong_t)tx->tx_txg);              ││                                                                  │
│+               }                                                 ││+               }                                                 │

, for which I have no guesses. Was it your change or the upstream patch changed during rebase?

#10 Updated by Sean Fagan about 1 year ago

Blast it. When applying the changes manually, I use emacs, which gets the indentation looking right, but obviously uses its own set of spaces and tabs to do so.

As for that patch, I'm going to have to look at it again -- I did not see that there, so now I have to see if they added another patch that I missed. Blast it.

#11 Updated by Sean Fagan about 1 year ago

  • Status changed from Needs Developer Review to Fix In Progress
  • Assignee changed from Alexander Motin to Sean Fagan

#12 Updated by Sean Fagan about 1 year ago

Oh.

So some of the changes you made were to reformat comments from the proposed patch away from the current form -- even when the comments are the same.

I deliberately did not do that, and ignored those changes.

#13 Updated by Alexander Motin about 1 year ago

I think there are just two different strategies: minimize the changes to stay closer to current OpenZFS, or follow the upstream patch as close as possible. Unfortunately, without seeing the future of this patch, it is difficult to guess which is better. As soon as this patch is pretty big, skipping few chunks out of it would not really help the first strategy much, while may hurt the second. Though I don't insist too much about the comments.

#14 Updated by Sean Fagan about 1 year ago

My particular reasoning for the comments was:
  1. He changed it from the source, and did so arbitrarily. If there's a style standard, they are likely to object.
  2. It is not our KNF, which is potentially a problem.

#15 Updated by Sean Fagan about 1 year ago

  • Status changed from Fix In Progress to 15
  • Assignee changed from Sean Fagan to Alexander Motin

I can't find the patch you quote above, and I'm not quite sure how you generated that. Look at all the diffs for the sorted scan that I grabbed, I don't see that particular change, so I'm fairly confused.

#16 Updated by Sean Fagan about 1 year ago

Hm, the sorted scan branch is branched from the pausable/resumable scrub branch I did. That could account for that kind of difference.

#17 Updated by Alexander Motin about 1 year ago

Sean Fagan wrote:

I can't find the patch you quote above, and I'm not quite sure how you generated that. Look at all the diffs for the sorted scan that I grabbed, I don't see that particular change, so I'm fairly confused.

I generated that with Midnight Commander's compare files command, comparing upstream and our patches, but you may probably use other tools, or just compare the patched code:
a) https://github.com/zfsonlinux/zfs/pull/6256/commits/1654b9f23c25dd018277e233ea231b45fd2949bf#diff-6b81467dec952fdbf74e9e913d16775eR3239 dsl_scan.c:3239
b) https://github.com/freenas/os/commit/c7811d1991e32533160a8a7f3392534a8a9dcb27 dsl_scan.c:3190

#18 Updated by Alexander Motin about 1 year ago

Sean Fagan wrote:

Hm, the sorted scan branch is branched from the pausable/resumable scrub branch I did. That could account for that kind of difference.

sef-zfs-sorted-scrub2 is not any more, I've told above that I rebased it, since pause/resume patch is already integrated into FreeBSD 11.

#19 Avatar?id=14398&size=24x24 Updated by Kris Moore about 1 year ago

  • Priority changed from Important to Blocks Until Resolved

Bumping priority on this one

#20 Updated by Sean Fagan about 1 year ago

I think I got confused by something you said -- you mean you rebased sef-zfs-sorted-scrub2 on FreeBSD-11, rather than on my branch, yes?

As I said, that could account for some of the differences you cited.

But let me go over those files, and compare them to freebsd 11, and what I've got.

#21 Updated by Alexander Motin about 1 year ago

Sean Fagan wrote:

I think I got confused by something you said -- you mean you rebased sef-zfs-sorted-scrub2 on FreeBSD-11, rather than on my branch, yes?

I've rebased it on top of freenas/11-stable.

#22 Updated by Sean Fagan about 1 year ago

Okay. I went through all the diffs in the guy's branch (created them by cloning his tree, and then doing a git diff). I did in fact miss some.

Some notes on what's left:

zfs.h:
struct pool_scan_stat: moved new fields to end of structure, for binary compatibility.

arc.c: * Did not add
+int zfs_arc_min_long_lifespan = 0;
because surrounding variables do not exist in freebsd. (Linux/solaris tunables?)
• arc_buf_alloc_impl() change is encryption-dependent, so diffs are different
• No arc_tuning_update() function

dsl_scan.c:
• No EOVERFLOW code, so patches there not applicable
• dsl_scan_cancel_sync()
- Did not delete call to spa_event_notify(), it wasn't in the original or patched version?
• Circa line 1618, mapped to diff "-813,15 +1650,6":
- We don’t have dn_extra_slots, which complicated applying the diff.
• dsl_scan_visitbp(): Our dprintf_bp call is not commented out. (It was commented out in the zfsonlinux original, but not in our original. I left it uncommented.)

Also, in the patches was a call to a function pointer in a later version of vdev_ops_t; I added that extra function and the various changes necessary for it. I can pull it out and the original function (dsl_scan_need_resilver) can be changed to treat it as true always.

I've tested scrubbing, and I did a resilver (removed a disk physically, did a TM backup, put it back in and onlined it). I'm going to install the latest nightly build, and do a scrub from that, to make sure everything worked.

That'll take quite a few hours, but I'll update when it's done.

#23 Avatar?id=14398&size=24x24 Updated by Kris Moore about 1 year ago

  • Assignee changed from Alexander Motin to Sean Fagan
  • Target version changed from 11.1 to 11.1-BETA1

#24 Updated by Dru Lavigne about 1 year ago

  • Status changed from 15 to 47

#25 Avatar?id=14398&size=24x24 Updated by Kris Moore about 1 year ago

  • Status changed from 47 to Fix In Progress

Dru, this still isn't merged, but it is a blocker for BETA1.

#26 Updated by Sean Fagan about 1 year ago

  • Status changed from Fix In Progress to Needs Developer Review
  • Assignee changed from Sean Fagan to Alexander Motin

Back to Alexander; I missed the reassignment.

#27 Updated by Alexander Motin 12 months ago

  • Related to Feature #25170: Add Resilver Priority to Storage tab added

#28 Avatar?id=14398&size=24x24 Updated by Kris Moore 12 months ago

  • Status changed from Needs Developer Review to Ready For Release
  • Assignee changed from Alexander Motin to Sean Fagan

#29 Updated by Alexander Motin 12 months ago

  • Assignee changed from Sean Fagan to Alexander Motin

There are two more new patches upstream.

#30 Updated by Alexander Motin 12 months ago

  • Assignee changed from Alexander Motin to Sean Fagan

Hmm. I was not trying to take it. :)

#31 Updated by Dru Lavigne 12 months ago

  • Related to Bug #26313: Merge in additional upstream scrub commits added

#32 Updated by Dru Lavigne 12 months ago

  • Status changed from Ready For Release to Resolved

Also available in: Atom PDF