Umbrella #27076: Replication and automatic snapshotting features rewrite to the new middleware
Periodic snapshots - autosnap - generate large demand on ARC metadata.
I have configured periodic snapshot tasks to run Mon-Fri, between 9:00 and 18:00, and generate a snapshot once every two hours. Snapshots are deleted after 2 weeks.
We have a LOT of datasets under this rule, and currently have 42900 snapshots.
But what I have noticed, is that during the period configured for automatic snapshots, there was an ever increasing number of ARC requests for demand_metadata. This is observed 9:00 to 18:00 Mon-Fri, but not observed during any other time.
The increase was happening after each snapshot happened, but the load on ARC requests was constant.
After two weeks, the increases stopped, I presume, because old snapshots get deleted.
When looking at graph with shortest timespan, it';s visible that the number of requests reaches about 175k, but actual average reported is 261k/s.
I could not figure out why it's doing this, but then I had a look at debug.log and this (below) is happening every minute).
Aug 18 16:39:00 hd02mlt autosnap.py: [tools.autosnap:66] Popen()ing: /sbin/zfs list -t snapshot -H -o name
Aug 18 16:39:10 hd02mlt autorepl.py: [tools.autorepl:233] Autosnap replication started
Aug 18 16:39:10 hd02mlt autorepl.py: [tools.autorepl:234] temp log file: /tmp/repl-97191
Aug 18 16:39:10 hd02mlt autorepl.py: [tools.autorepl:617] Autosnap replication finished
Aug 18 16:40:00 hd02mlt autosnap.py: [tools.autosnap:66] Popen()ing: /sbin/zfs list -t snapshot -H -o name
Aug 18 16:40:10 hd02mlt autorepl.py: [tools.autorepl:233] Autosnap replication started
Aug 18 16:40:10 hd02mlt autorepl.py: [tools.autorepl:234] temp log file: /tmp/repl-97390
Aug 18 16:40:10 hd02mlt autorepl.py: [tools.autorepl:617] Autosnap replication finished
So, maybe this isn't so much of a problem with only few datasets, but with large numbers it's becoming an issue.
#6 Updated by Wojciech Kruzel about 4 years ago
Vaibhav Chauhan wrote:
BRB: what is the issue. please elaborate.
The issue is 260k ARC requests /second every minute from 9:00 to 18:00, while the actual snapshots created are at 9,11,13,15,17 hours.
Autosnap.py and autorepl.py are run every minute within the period, but really they should run only at full hours.
#31 Updated by Alexander Motin over 2 years ago
- Category changed from OS to Middleware
- Assignee changed from Alexander Motin to William Grzybowski
I am not sure what could be done here on the OS side. Snapshot listing is known to be a heavy operation, requiring many disks head seeks if not cached in ARC (in which case I'd like to see some kind of prefetch there, but there still none), and if cached, as I suppose here, then the only way to optimize that is to not list the snapshots, or at least make sure to only list names, not the other properties, which may require additional data access.