Re: [PATCH 3/6] Btrfs: fix race setting block group readonly during device replace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 20, 2016 at 12:44 AM,  <fdmanana@xxxxxxxxxx> wrote:
> From: Filipe Manana <fdmanana@xxxxxxxx>
>
> When we do a device replace, for each device extent we find from the
> source device, we set the corresponding block group to readonly mode to
> prevent writes into it from happening while we are copying the device
> extent from the source to the target device. However just before we set
> the block group to readonly mode some concurrent task might have already
> allocated an extent from it or decided it could perform a nocow write
> into one of its extents, which can make the device replace process to
> miss copying an extent since it uses the extent tree's commit root to
> search for extents and only once it finishes searching for all extents
> belonging to the block group it does set the left cursor to the logical
> end address of the block group - this is a problem if the respective
> ordered extents finish while we are searching for extents using the
> extent tree's commit root and no transaction commit happens while we
> are iterating the tree, since it's the delayed references created by the
> ordered extents (when they complete) that insert the extent items into
> the extent tree (using the non-commit root of course).
> Example:
>
>           CPU 1                                            CPU 2
>
>  btrfs_dev_replace_start()
>    btrfs_scrub_dev()
>      scrub_enumerate_chunks()
>        --> finds device extent belonging
>            to block group X
>
>                                <transaction N starts>
>
>                                                       starts buffered write
>                                                       against some inode
>
>                                                       writepages is run against
>                                                       that inode forcing dellaloc
>                                                       to run
>
>                                                       btrfs_writepages()
>                                                         extent_writepages()
>                                                           extent_write_cache_pages()
>                                                             __extent_writepage()
>                                                               writepage_delalloc()
>                                                                 run_delalloc_range()
>                                                                   cow_file_range()
>                                                                     btrfs_reserve_extent()
>                                                                       --> allocates an extent
>                                                                           from block group X
>                                                                           (which is not yet
>                                                                            in RO mode)
>                                                                     btrfs_add_ordered_extent()
>                                                                       --> creates ordered extent Y
>                                                         flush_epd_write_bio()
>                                                           --> bio against the extent from
>                                                               block group X is submitted
>
>        btrfs_inc_block_group_ro(bg X)
>          --> sets block group X to readonly
>
>        scrub_chunk(bg X)
>          scrub_stripe(device extent from srcdev)
>            --> keeps searching for extent items
>                belonging to the block group using
>                the extent tree's commit root
>            --> it never blocks due to
>                fs_info->scrub_pause_req as no
>                one tries to commit transaction N
>            --> copies all extents found from the
>                source device into the target device
>            --> finishes search loop
>
>                                                         bio completes
>
>                                                         ordered extent Y completes
>                                                         and creates delayed data
>                                                         reference which will add an
>                                                         extent item to the extent
>                                                         tree when run (typically
>                                                         at transaction commit time)
>
>                                                           --> so the task doing the
>                                                               scrub/device replace
>                                                               at CPU 1 misses this
>                                                               and does not copy this
>                                                               extent into the new/target
>                                                               device
>
>        btrfs_dec_block_group_ro(bg X)
>          --> turns block group X back to RW mode
>
>        dev_replace->cursor_left is set to the
>        logical end offset of block group X
>
> So fix this by waiting for all cow and nocow writes after setting a block
> group to readonly mode.
>
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>

Reviewed-by: Josef Bacik <jbacik@xxxxxx>

Thanks!

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux