On 2019/3/11 下午8:37, Nikolay Borisov wrote: > > > On 11.03.19 г. 14:35 ч., Qu Wenruo wrote: >> >> >> On 2019/3/11 下午8:26, Nikolay Borisov wrote: >>> >>> >>> On 11.03.19 г. 3:17 ч., Qu Wenruo wrote: >>>> >>>> >>>> On 2019/3/11 上午7:09, Chris Murphy wrote: >>>>> In the case where superblock 0 at 65536 is valid but stale (older than >>>>> the others): >>>> >>>> Then this means either the fs is fuzzed, or the FUA implementation of >>>> the disk is completely screwed up. >>>> >>>> Btrfs kernel submit super blocks as the following sequence: >>>> 1) wait all metadata write >>>> 2) flush >>>> 3) FUA the primary superblock >>> >>> SATA devices generally do not have FUA support. For example my evo 850 >>> ssds do not support it nor does my evo 860 PRO. IMO not having >>> functioning FUA seems to be the norm rather than an exception. >> >> Kernel block layer will translate FUA to write + flush. > > Where exactly does this happen? block/blk-flush.c The comment part at the beginning: * If the device has writeback cache and doesn't support FUA, REQ_PREFLUSH * is translated to PREFLUSH and REQ_FUA to POSTFLUSH. I need extra digging for exactly which line does this, but I think that should explain the workflow fine. Thanks, Qu > >> So in that case we will do: >> >> 1) wait all metadata write >> 2) flush >> 3) write first sb, flush >> 4) write backup sb >> >> For FUA -> write + flush, it's less atomic than native FUA, but it >> should be good enough for pseudo-atomic. >> >> Thanks, >> Qu >> >>> >>> >>>> 4) write the backup superblocks >>>> >>>> If backup is newer than primary, then the FUA write doesn't reach disk >>>> before normal write. >>>> This means any fs could be corrupted on that disk, not only btrfs. >>>> >>>>> >>>>> 1. btrfs check doesn't complain, the stale super is used for the check >>>>> 2. when mounting, super 0 is used, no complaints at mount time, fairly >>>>> quickly the newer supers are overwritten >>>> >>>> The reason why kernel doesn't search backup roots is to avoid stale btrfs. >>>> For case like mkfs.btrfs -> do btrfs write -> mkfs.xfs -> try mount as >>>> btrfs again, this would cause problems. >>>> >>>> So IMHO always use the primary superblock is the designed behavior. >>>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> Is this expected? In particular, in lieu of `btrfs rescue super` >>>>> behavior which considers super 0 a bad super, and offers to fix it >>>>> from the newer ones, and when I answer y, it replaces super 0 with >>>>> newer information from the other supers. >>>>> >>>>> I think the `btrfs rescue` behavior is correct. I would expect that >>>>> all the supers are read at mount time, and if there's discrepancy that >>>>> either there's code to suspiciously sanity check the latest roots in >>>>> the newest super, or it flat out fails to mount. Mounting based on >>>>> stale super data seems risky doesn't it? >>>>> >>>> >>
