Re: [PATCH] btrfs: raid56: data corruption on a device removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/12/2018 01:25, Dmitriy Gorokh wrote:
> I found that RAID5 or RAID6 filesystem might be got corrupted in the following scenario:
> 
> 1. Create 4 disks RAID6 filesystem
> 2. Preallocate 16 10Gb files
> 3. Run fio: 'fio --name=testload --directory=./ --size=10G --numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256 --time_based --runtime=3600’
> 4. After few minutes pull out two drives: 'echo 1 > /sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’
> 
> About 5 of 10 times the test is run, it led to silent data corruption of a random extent, resulting in ‘IO Error’ and ‘csum failed’ messages while trying to read the affected file. It usually affects only small portion of the files and only one underlying extent of a file. When I converted logical address of the damaged extent to physical address and dumped a stripe directly from drives, I saw specific pattern, always the same when the issue occurs.
> 
> I found that few bios which were being processed right during the drives removal, contained non zero bio->bi_iter.bi_done field despite of  EIO bi_status. bi_sector field was also increased from original one by that 'bi_done' value. Looks like this is a quite rare condition. Subsequently, in the raid_rmw_end_io handler that failed bio can be translated to a wrong stripe number and fail wrong rbio.
> 
> 

Please wrap the lines in you commit message at 75 and provide a
Signed-off-by line, see [1].

[1]
https://www.kernel.org/doc/html/latest/process/submitting-patches.html?highlight=signed%20off#sign-your-work-the-developer-s-certificate-of-origin

[...]

>         physical <<= 9;
> +       // Since the failed bio can return partial data, bi_sector might be incremented
> +       // by that value. We need to revert it back to the state before the bio was submitted.
> +       physical -= bio->bi_iter.bi_done;

Please no C++ Style comments.

Otherwise,
Reviewed-by: Johannes Thumshirn <jthumshirn@xxxxxxx>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@xxxxxxx                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux