On 1/30/19 5:38 PM, Hans van Kranenburg wrote: > On 1/30/19 4:26 PM, Christoph Anton Mitterer wrote: >> On Wed, 2019-01-30 at 07:58 -0500, Austin S. Hemmelgarn wrote: >>> Running dm-integrity without a journal is roughly equivalent to >>> using >>> the nobarrier mount option (the journal is used to provide the same >>> guarantees that barriers do). IOW, don't do this unless you are >>> willing >>> to lose the whole volume. >> >> That sounds a bit strange to me. >> >> My understanding was that the idea of being able to disable the journal >> of dm-integrity was just to avoid any double work, if equivalent >> guarantees are already given by higher levels. >> >> If btrfs is by itself already safe (by using barriers), then I'd have >> expected that not transaction is committed, unless it got through all >> lower layers... so either everything works well on the dm-integrity >> base (and thus no journal is needed)... or it fails there... but then >> btrfs would already safe by it's own means (barriers + CoW)? > > This. Exactly this. > > The reason that this journal of dm-integrity has to be used is because > data and the checksum of that data gets written in two different places. > The result of using it is that you'll always read back data with > matching checksums, either the previous data, or the new data. > > https://arxiv.org/pdf/1807.00309.pdf > See Section 4.4 "Recovery on Write Failure". > > "A device must provide atomic updating of both data and metadata. A > situation in which one part is written to media while another part > failed must not occur." > > Now, the great thing here is that btrfs does not overwrite disk data in > place. It writes out new data, metadata and then the superblock. So, > e.g. on power loss, I don't care about whatever happened to writes that > are not visible because the superblock was never written? Btrfs will not > read these disk sectors back, because it's unused space. So, to reiterate from first post, this means that I cannot use nocow or directio", because it goes around the cow safety net. Also, there is still a risk, which is of course writing the superblocks. If all copies of superblock on a single device are written, and all of them lack the updated checksum, then I'll lose the fs, and will have to either repair that manually, or restore from send/receive backups of a few minutes ago. > Also, it's not a write hole like in RAID56, because when "pulling the > plug" between writing out data and metadata, the checksums of older > existing data sectors are not corrupted, only new writes that were in > flight... I think... But the the pdf is still mentioning (also in 4.4) > "Furthermore, metadata sectors are packed with tags for multiple > sectors; thus, a write failure must not cause an integrity validation > failure for other sectors". From the design, I can however not see how > this could happen. > > I asked on dm-devel list a while ago about this, but the mailing list > post never got any reply. Hans
