ATA disk write cache setting consideration
I am using https://github.com/rkojedzinszky/freebsd/commit/4ef12e909772b64d4642d324114271edc90adb7f on our freenas builds, which is an idea from the linux kernel. If a disk is configured to disable its write cache, we should not issue SYNC_CACHE commands, and thus we can improve transaction counts on the SATA ports. Actually I've made tests with an Intel S3700 SSD, which can handle 19k write iops. Without this modification, FreeBSD achieves around 5k IOPS on SATA2, while with this patch and write cache disabled around 9K IOPS can be achieved on a SATA2 port.
#2 Updated by Alexander Motin almost 3 years ago
- Status changed from Unscreened to Screened
This indeed can be interesting from performance point of view for load with very tiny transactions (below 128KB) on very fast SLOG devices. For bigger transactions flush overhead is not significant, while penalty from disabling cache theoretically can be more significant. On the other side it is not guarantied that some cheap SATA device really respects cache disabling, and that "disabling" it won't lead to data loss/corruption if we skip explicit cache flushing.
#3 Updated by Richard Kojedzinszky almost 3 years ago
Actually, the assumption is that when a drive has its write cache disabled, it should guarantee that data sent to it has been written out. No matter that the drive is a cheap one or not. I have made tests with the mentioned drive, and as it has power loss protection I assume it always uses its write cache, even if it is turned off. It will guarantee writes.
Regarding other drives I have no exact experience, but assuming they conform to specifications, in either way they should never lose data.
#4 Updated by Alexander Motin almost 3 years ago
Richard Kojedzinszky wrote:
but assuming they conform to specifications
I assume we are living in the same universe, right? ;) In my universe there are tons of USB sticks that often can not report their size correctly, not speaking about some cache persistence. So far ATA disks usually were better then USB, but this is too sensitive matter to risk.
#5 Updated by Richard Kojedzinszky almost 3 years ago
I think devices which fail the specifications will fail even with write cache+sync_cache commands, and without them at all. Unfortunately, I have experience with earlier consumer grade Kingston, or Samsung SSD drives, which of course was failing power-loss tests (https://github.com/rkojedzinszky/zfsziltest). The proposed patch would only make different behaviour when the write cache is disabled, only in that case. And again I must say that the linux community assumes this behaviour is right. I was reading linux kernel 3.2's block/blk-flush.c comments, where the author says the same. That code is at least five years old, I have not checked earlier behaviour.
I think it is worth rethinking, as FreeBSD always were the power to serve, and with this performance could be improved. Working with crappy hardware is always a risk.
Strange enough, due to some firmware bug we had a seagate sas hdd (!!) which ignored the sync_cache command when the write cache was enabled, and thus it served around 1K random writes. Of course a hdd cannot achieve that performance, that was a bug. So I trust only drives/ssds only after they have passed my zfsziltest.
#6 Updated by Richard Kojedzinszky almost 3 years ago
So what is the final decision regarding this feature request? By the way with freebsd/freenas defaults, write caches are enabled for drives, and in that case nothing would change. The only change would be when the write cache is disabled, and that time the cache flush commands would be skipped. Again to say the linux kernel has been working this way for years.
#7 Updated by Kris Moore almost 3 years ago
- Status changed from Screened to Closed: Third party to resolve
For now I think we won't have the ability to implement and test all the possible ramifications of making this change. As Sasha indicated, this could blow up in our faces with so many devices which don't correctly report settings. The consequences of this could be bad. I think this should be taken up with upstream FreeBSD first.