On 2018年06月28日 21:26, Anand Jain wrote:
>
>
> On 06/28/2018 04:02 PM, Qu Wenruo wrote:
>> Also CC Anand Jain, since he is also working on various device related
>> work, and an expert on this.
>>
>> Especially I'm not pretty sure if such enhancement is already on his
>> agenda or there are already some unmerged patch for this.
>
> Right, some of the patches are already in the ML and probably it needs
> a revival. Unless there are new challenges, my target is to consolidate
> them and get them integrated by this year. I am trying harder.
>
> I think its a good idea to report the write-hole status to the
> user-land instead of failing the mount, so that a script can trigger
> the re-silver as required by the use case. I introduced
> fs_devices::volume_flags, which is under review as of now, and have
> plans to add the volume degraded state bit which can be accessed by
> the sysfs. So yes this is been taken care.
Glad to hear that!
However I found it pretty hard to trace your latest work, would you mind
to share the git repo and branch you're working on?
Maybe I could take some time to do some review.
Thanks,
Qu
>
>
> Thanks, Anand
>
>
>> Thanks,
>> Qu
>>
>> On 2018年06月28日 15:04, Qu Wenruo wrote:
>>> There is a reporter considering btrfs raid1 has a major design flaw
>>> which can't handle nodatasum files.
>>>
>>> Despite his incorrect expectation, btrfs indeed doesn't handle device
>>> generation mismatch well.
>>>
>>> This means if one devices missed and re-appeared, even its generation
>>> no longer matches with the rest device pool, btrfs does nothing to it,
>>> but treat it as normal good device.
>>>
>>> At least let's detect such generation mismatch and avoid mounting the
>>> fs.
>>> Currently there is no automatic rebuild yet, which means if users find
>>> device generation mismatch error message, they can only mount the fs
>>> using "device" and "degraded" mount option (if possible), then replace
>>> the offending device to manually "rebuild" the fs.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>>> ---
>>> I totally understand that, generation based solution can't handle
>>> split-brain case (where 2 RAID1 devices get mounted degraded separately)
>>> at all, but at least let's handle what we can do.
>>>
>>> The best way to solve the problem is to make btrfs treat such lower gen
>>> devices as some kind of missing device, and queue an automatic scrub for
>>> that device.
>>> But that's a lot of extra work, at least let's start from detecting such
>>> problem first.
>>> ---
>>> fs/btrfs/volumes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 50 insertions(+)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index e034ad9e23b4..80a7c44993bc 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -6467,6 +6467,49 @@ static void btrfs_report_missing_device(struct
>>> btrfs_fs_info *fs_info,
>>> devid, uuid);
>>> }
>>> +static int verify_devices_generation(struct btrfs_fs_info *fs_info,
>>> + struct btrfs_device *dev)
>>> +{
>>> + struct btrfs_fs_devices *fs_devices = dev->fs_devices;
>>> + struct btrfs_device *cur;
>>> + bool warn_only = false;
>>> + int ret = 0;
>>> +
>>> + if (!fs_devices || fs_devices->seeding || !dev->generation)
>>> + return 0;
>>> +
>>> + /*
>>> + * If we're not replaying log, we're completely safe to allow
>>> + * generation mismatch as it won't write anything to disks, nor
>>> + * remount to rw.
>>> + */
>>> + if (btrfs_test_opt(fs_info, NOLOGREPLAY))
>>> + warn_only = true;
>>> +
>>> + rcu_read_lock();
>>> + list_for_each_entry_rcu(cur, &fs_devices->devices, dev_list) {
>>> + if (cur->generation && cur->generation != dev->generation) {
>>> + if (warn_only) {
>>> + btrfs_warn_rl_in_rcu(fs_info,
>>> + "devid %llu has unexpected generation, has %llu expected %llu",
>>> + dev->devid,
>>> + dev->generation,
>>> + cur->generation);
>>> + } else {
>>> + btrfs_err_rl_in_rcu(fs_info,
>>> + "devid %llu has unexpected generation, has %llu expected %llu",
>>> + dev->devid,
>>> + dev->generation,
>>> + cur->generation);
>>> + ret = -EINVAL;
>
>
>
>
>>> + break;
>>> + }
>>> + }
>>> + }
>>> + rcu_read_unlock();
>>> + return ret;
>>> +}
>>> +
>>> static int read_one_chunk(struct btrfs_fs_info *fs_info, struct
>>> btrfs_key *key,
>>> struct extent_buffer *leaf,
>>> struct btrfs_chunk *chunk)
>>> @@ -6552,6 +6595,13 @@ static int read_one_chunk(struct btrfs_fs_info
>>> *fs_info, struct btrfs_key *key,
>>> return PTR_ERR(map->stripes[i].dev);
>>> }
>>> btrfs_report_missing_device(fs_info, devid, uuid, false);
>>> + } else {
>>> + ret = verify_devices_generation(fs_info,
>>> + map->stripes[i].dev);
>>> + if (ret < 0) {
>>> + free_extent_map(em);
>>> + return ret;
>>> + }
>>> }
>>> set_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
>>> &(map->stripes[i].dev->dev_state));
>>>
>>
Attachment:
signature.asc
Description: OpenPGP digital signature
