On Thu, Feb 06, 2020 at 08:20:16AM +0000, Johannes Thumshirn wrote: > >> @@ -3497,9 +3506,23 @@ static int write_dev_supers(struct btrfs_device *device, > >> op_flags = REQ_SYNC | REQ_META | REQ_PRIO; > >> if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER)) > >> op_flags |= REQ_FUA; > > > > Question on the existing code: why is it safe to not use FUA for the > > subsequent superblocks? > > > >> + > >>C + /* > >> + * Directly use BIOs here instead of relying on the page-cache > >> + * to do I/O, so we don't loose the ability to do integrity > >> + * checking. > >> + */ > >> + bio = bio_alloc(gfp_mask, 1); > >> + bio_set_dev(bio, device->bdev); > >> + bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT; > >> + bio->bi_private = device; > >> + bio->bi_end_io = btrfs_end_super_write; > >> + bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE, > >> + offset_in_page(bytenr)); > > > > Missing return value check. But given that it is a single page and > > can't error out please switch to __bio_add_page here. > IR > Good question, I guess it's saver to always FUA the SBs That is a performance optimization IIRC, only the primary superblock does FUA the backup superblocks don't as this would add 2 more flushes that are considered expensive. The trade-off is optimistic because the backup superblocks are almost never necessary. For the common power-fail situation primary will be there or not atomically, the non-FUA writes of secondary superblocks will be perhaps delayed a bit. The scenario where the primary sb is unexpectedly damaged would have to happen in the short window between primary FUA and backup writes, so the current version of sb is not available. Something like that: write primary sb 1 FUA write backup copy 1 other writes write backup copy 2 other writes 2 FUA (or equvalent flushing the copies to device) The window is between 1 and 2, and if some divine force kills primary sb, the backup copies are not permanently stored yet. Which makes recovery of the last transaction tricky, but there are still the backup superblocks with previous intact version. With FUA after each backup, the window would be shortened, with only 2 blocks written, allowing to access the latest transaction, or possibly the previous one too given where exactly the write sequence is interrupted. The above describes possible scenario but I consider it quite rare to hit in practice, also it depends on the device that should not just skip writes or FUAs. So the performance optimization is IMO justified.
