At 03/17/2017 12:44 PM, Liu Bo wrote:
On Fri, Feb 03, 2017 at 04:20:22PM +0800, Qu Wenruo wrote:Before this patch, btrfs raid56 will keep raid56 rbio even all its IO is done. This may save some time allocating rbio, but it can cause deadly use-after-free bug, for the following case: Original fs: 4 devices RAID5 Process A | Process B -------------------------------------------------------------------------- | Start device replace | Now the fs has 5 devices | devid 0: replace device | devid 1~4: old devices btrfs_map_bio() | |- __btrfs_map_block() | | bbio has 5 stripes | | including devid 0 | |- raid56_parity_write() | | raid_write_end_io() | |- rbio_orig_end_io() | |- unlock_stripe() | Keeps the old rbio for | later steal, which has | stripe for devid 0 | | Cancel device replace | Now the fs has 4 devices | devid 0 is freed Some IO happens | raid_write_end_io() | |- rbio_orig_end_io() | |- unlock_stripe() | |- steal_rbio() | Use old rbio, whose | bbio has freed devid 0| stripe | Any access to rbio->bbio will | cause general protection or NULL | pointer dereference | Such bug can already be triggered by fstests btrfs/069 for RAID5/6 profiles. Fix it by not keeping old rbio in unlock_stripe(), so we just free the finished rbio and rbio->bbio, so above problem wont' happen.I don't think this is acceptable, keeping a cache is important for raid56 write performance, could you please fix it by checking if the device is missing?
Not possible, as it's keeping the btrfs_device pointer and never release it, the stolen rbio can be kept forever until umount.
And I think the logical is very strange, if RAID5/6 is unstable, there is no meaning to keep it fast.
Keep it stable first, and then consider the performance. Thanks, Qu
Thanks, -liuboSigned-off-by: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx> --- fs/btrfs/raid56.c | 18 +----------------- 1 file changed, 1 insertion(+), 17 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 453eefdcb591..aba82b95ec73 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -776,7 +776,6 @@ static noinline void unlock_stripe(struct btrfs_raid_bio *rbio) int bucket; struct btrfs_stripe_hash *h; unsigned long flags; - int keep_cache = 0; bucket = rbio_bucket(rbio); h = rbio->fs_info->stripe_hash_table->table + bucket; @@ -788,19 +787,6 @@ static noinline void unlock_stripe(struct btrfs_raid_bio *rbio) spin_lock(&rbio->bio_list_lock); if (!list_empty(&rbio->hash_list)) { - /* - * if we're still cached and there is no other IO - * to perform, just leave this rbio here for others - * to steal from later - */ - if (list_empty(&rbio->plug_list) && - test_bit(RBIO_CACHE_BIT, &rbio->flags)) { - keep_cache = 1; - clear_bit(RBIO_RMW_LOCKED_BIT, &rbio->flags); - BUG_ON(!bio_list_empty(&rbio->bio_list)); - goto done; - } - list_del_init(&rbio->hash_list); atomic_dec(&rbio->refs); @@ -848,13 +834,11 @@ static noinline void unlock_stripe(struct btrfs_raid_bio *rbio) goto done_nolock; } } -done: spin_unlock(&rbio->bio_list_lock); spin_unlock_irqrestore(&h->lock, flags); done_nolock: - if (!keep_cache) - remove_rbio_from_cache(rbio); + remove_rbio_from_cache(rbio); } static void __free_raid_bio(struct btrfs_raid_bio *rbio) -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
