On 2017-04-11 05:55, Adam Borowski wrote:
On Tue, Apr 11, 2017 at 06:01:19AM +0200, Kai Krakow wrote:
Yes, I know all this. But I don't see why you still want noatime or
relatime if you use lazytime, except for super-optimizing. Lazytime
gives you POSIX conformity for a problem that the other options only
tried to solve.
(Besides lazytime also working on mtime, and, technically, ctime.)
Nope, it by definition can't work on ctime because a ctime update means
something else changed in the inode, which in turn will cause it to be
flushed to disk normally (lazytime only defers the flush as long as
nothing else in the inode is different, so it won't help much on stuff
like traditional log files because their size is changing regularly
(which updates the inode, which then causes it to get flushed)).
First: atime, in any form, murders snapshots. On any filesystem that has
them, not just btrfs -- I've tested zfs and LVM snapshots, there's also
qcow2/vdi and so on. On all of them, every single read-everything operation
costs you 5% disk space. For a _read_ operation!
I've tested /usr-y mix of files, for consistency with the guy who mentioned
this problem first. Your mileage will vary depending on whether you store
100GB disk images or a news spool.
Read-everything is quite rare, but most systems have at least one
stat-everything cronjob. That touches only diratime, but that's still
1-in-11 inodes (remarkably consistent: I've checked a few machines with
drastically different purposes, and somehow the min was 10, max 12).
And no, marking snapshots as ro doesn't help: reading the live version still
breaks CoW.
Second: atime murders media with limited write endurance. Modern SSD can
cope well, but I for one work a lot with SD and eMMC. Every single SoC
image I've seen uses noatime for this reason.
Even on SSD's it's still an issue, especially if it's something like
ext4 which uses inode tables (updating one inode will usually require a
RMW of an erase block regardless, but using inode tables means that this
happens _all the time_).
Third: relatime/lazytime don't eliminate the performance cost. They fix
only frequently read files -- if you have a big filesystem where you read a
lot but individual files tend to be read rarely, relatime is as bad as
strictatime, and lazytime actually worse. Both will do an unnecessary write
of all inodes.
Four: why? Beside being POSIXLY_CORRECT, what do you actually gain from
atime? I can think only of:
* new mail notification with mbox. Just patch the mail reader to manually
futimens(..., {UTIME_NOW,UTIME_OMIT}), it has no extra cost on !noatime
mounts. I've personally did so for mutt, the updated version will ship
in Debian stretch; you can patch other mail readers although they tend
to be rarely used in conjunction with shell access (and thus they have
no need for atime at all).
* Debian's popcon's "vote" field. Use "inst", and there's no gain from
popcon for you personally.
* some intrusion detection forensics (broken by open(..., O_NOATIME))
On top of all that:
Five:
Handling of atime slows down stat and a handful of other things. If you
take a source tree the size of the Linux kernel, write a patch that
changes every file (even just one character), and then go to commit it
in Git (or SVN, or Bazaar, or Mercurial), you'll see a pretty serious
difference in the time it takes to commit because almost all VCS
software calls stat() on the entire tree. relatime won't help much here
because the check to determine whether or not to update the atime still
has to happen (in fact, it will hurt slightly, strictatime eliminates
that check).
Six:
It doesn't behave how most users would inherently expect, partly because
there are ways to bypass it even if the FS is mounted with strictatime.
Conclusion: death to atime!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html