On 2018年07月13日 15:24, Anand Jain wrote: > > > On 07/13/2018 01:39 PM, Qu Wenruo wrote: >> >> >> On 2018年07月13日 13:32, Anand Jain wrote: >>> >>> >>> >>> >>> >>>>> But if you are planning to >>>>> record and start at transaction [14] then its an overkill because >>>>> transaction [19 and [20] are already in the disk. >>>> >>>> Yes, I'm doing it overkilled. >>> >>> Ah. Ok. >>> >>>> But it's already much better than scrub all block groups (my original >>>> plan). >>> >>> That's true. Which can be optimized later, but how? and scrub can't >>> fix RAID1. >> >> How could scrub not fix RAID1? > > Because degraded RAID1 allocates and writes data to the single chunks. Isn't that what you're working on? Degraded RAID1 chunk allocation. > There is no mirrored copy of these data and it would remain as it is > even after the scrub. > >> For metadata or data with csum, just goes normal scrub. > > Still need to fix the generation check for bg/parent transit > verification across the trees/disks part. IMO. Did you mean since scrub is just reading out each copy and verify its metadata csum, it's possible that one old metadata passes csum check and scrub can't detect it unless we also try to read the other copy? That's indeed a problem, and unlike normal tree read routine, we have transid/first_key/level check which can expose such problem. > >> For data without csum, we know which device is resilvering, just use the >> other copy. > > If its a short term fix then its ok. But I think the approach is > similar to Liubo's InSync patch. Problem with this is, we will fail > to recover any data when the good disk throws media errors. That's a trade-off between the recovery granularity. In fact even for written bitmap, during RAID1 resilvering if we have a copy failed to read, we can still hit the same problem. Although with smaller granularity, we are less possible to hit such problem. The main point of my bg-based recovery is we could reuse scrub and block group is already the middle level granularity in btrfs. This makes me re-think about the possibility to use written bitmap for each device extent. Although it still takes a lot of space and can't fit into one tree leaf. (Needs 32K for 1G dev extent). Thanks, Qu > > Thanks, Anand > >> Thanks, >> Qu >> >>> >>> Thanks, Anand >>> >>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> Thanks, Anand >>>>> >>>>> >>>>>> Thanks, >>>>>> Qu >>>>>> >>>>>>> [3] https://patchwork.kernel.org/patch/10403311/ >>>>>>> >>>>>>> Further, as we do a self adapting chunk allocation in RAID1, it >>>>>>> needs >>>>>>> balance-convert to fix. IMO at some point we have to provide >>>>>>> degraded >>>>>>> raid1 chunk allocation and also modify the scrub to be chunk >>>>>>> granular. >>>>>>> >>>>>>> Thanks, Anand >>>>>>> >>>>>>>> Any idea on this? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Qu >>>>>>>> >>>>>>>>> Unlock: btrfs_fs_info::chunk_mutex >>>>>>>>> Unlock: btrfs_fs_devices::device_list_mutex >>>>>>>>> >>>>>>>>> ----------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, Anand >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe >>>>>>>>> linux-btrfs" in >>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe >>>>>>> linux-btrfs" in >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe >>>>> linux-btrfs" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>
Attachment:
signature.asc
Description: OpenPGP digital signature
