Re: [PATCH v2] btrfs: raid56: data corruption on a device removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 14, 2018 at 08:48:50PM +0300, Dmitriy Gorokh wrote:
> RAID5 or RAID6 filesystem might get corrupted in the following scenario:
> 
> 1. Create 4 disks RAID6 filesystem
> 2. Preallocate 16 10Gb files
> 3. Run fio: 'fio --name=testload --directory=./ --size=10G
> --numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256
> --time_based --runtime=3600’
> 4. After few minutes pull out two drives: 'echo 1 >
> /sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’
> 
> About 5 of 10 times the test is run, it led to a silent data
> corruption of a random stripe, resulting in ‘IO Error’ and ‘csum
> failed’ messages while trying to read the affected file. It usually
> affects only small portion of the files.
> 
> It is possible that few bios which were being processed during the
> drives removal, contained non zero bio->bi_iter.bi_done field despite
> of EIO bi_status. bi_sector field was also increased from original one
> by that 'bi_done' value. Looks like this is a quite rare condition.
> Subsequently, in the raid_rmw_end_io handler that failed bio can be
> translated to a wrong stripe number and fail wrong rbio.
> 
> Reviewed-by: Johannes Thumshirn <jthumshirn@xxxxxxx>
> Signed-off-by: Dmitriy Gorokh <dmitriy.gorokh@xxxxxxx>
> ---
>  fs/btrfs/raid56.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 3c8093757497..cd2038315feb 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1451,6 +1451,12 @@ static int find_bio_stripe(struct btrfs_raid_bio *rbio,
>   struct btrfs_bio_stripe *stripe;
> 
>   physical <<= 9;
> + /*
> +  * Since the failed bio can return partial data, bi_sector might be
> +  * incremented by that value. We need to revert it back to the
> +  * state before the bio was submitted.
> +  */
> + physical -= bio->bi_iter.bi_done;

The bi_done member has been removed in recent block layer changes
commit 7759eb23fd9808a2e4498cf36a798ed65cde78ae ("block: remove
bio_rewind_iter()"). I wonder what kind of block-magic do we need to do
as the iterators seem to be local and there's nothing available in the
call chain leading to find_bio_stripe. Johannes, any ideas?



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux