On Fri, May 20, 2016 at 12:44 AM, <fdmanana@xxxxxxxxxx> wrote:
> From: Filipe Manana <fdmanana@xxxxxxxx>
>
> When it's finishing, the device replace code iterates all extent maps
> representing block group and for each one that has a stripe that refers
> to the source device, it replaces its device with the target device.
> However when it replaces the source device with the target device it,
> the target device still has an ID of 0ULL (BTRFS_DEV_REPLACE_DEVID),
> only after its ID is changed to match the one from the source device.
> This leads to races with the chunk removal code that can temporarly see
> a device with an ID of 0ULL and then attempt to use that ID to remove
> items from the device tree and fail, causing a transaction abort:
>
> [ 9238.594364] BTRFS info (device sdf): dev_replace from /dev/sdf (devid 3) to /dev/sde finished
> [ 9238.594377] ------------[ cut here ]------------
> [ 9238.594402] WARNING: CPU: 14 PID: 21566 at fs/btrfs/volumes.c:2771 btrfs_remove_chunk+0x2e5/0x793 [btrfs]
> [ 9238.594403] BTRFS: Transaction aborted (error 1)
> [ 9238.594416] Modules linked in: btrfs crc32c_generic acpi_cpufreq xor tpm_tis tpm raid6_pq ppdev parport_pc processor psmouse parport i2c_piix4 evdev sg i2c_core se
> rio_raw pcspkr button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod fl
> oppy [last unloaded: btrfs]
> [ 9238.594418] CPU: 14 PID: 21566 Comm: btrfs-cleaner Not tainted 4.6.0-rc7-btrfs-next-29+ #1
> [ 9238.594419] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
> [ 9238.594421] 0000000000000000 ffff88017f1dbc60 ffffffff8126b42c ffff88017f1dbcb0
> [ 9238.594422] 0000000000000000 ffff88017f1dbca0 ffffffff81052b14 00000ad37f1dbd18
> [ 9238.594423] 0000000000000001 ffff88018068a558 ffff88005c4b9c00 ffff880233f60db0
> [ 9238.594424] Call Trace:
> [ 9238.594428] [<ffffffff8126b42c>] dump_stack+0x67/0x90
> [ 9238.594430] [<ffffffff81052b14>] __warn+0xc2/0xdd
> [ 9238.594432] [<ffffffff81052b7a>] warn_slowpath_fmt+0x4b/0x53
> [ 9238.594434] [<ffffffff8116c311>] ? kmem_cache_free+0x128/0x188
> [ 9238.594450] [<ffffffffa04d43f5>] btrfs_remove_chunk+0x2e5/0x793 [btrfs]
> [ 9238.594452] [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc
> [ 9238.594464] [<ffffffffa04a26fa>] btrfs_delete_unused_bgs+0x317/0x382 [btrfs]
> [ 9238.594476] [<ffffffffa04a961d>] cleaner_kthread+0x1ad/0x1c7 [btrfs]
> [ 9238.594489] [<ffffffffa04a9470>] ? btree_invalidatepage+0x8e/0x8e [btrfs]
> [ 9238.594490] [<ffffffff8106f403>] kthread+0xd4/0xdc
> [ 9238.594494] [<ffffffff8149e242>] ret_from_fork+0x22/0x40
> [ 9238.594495] [<ffffffff8106f32f>] ? kthread_stop+0x286/0x286
> [ 9238.594496] ---[ end trace 183efbe50275f059 ]---
>
> The sequence of steps leading to this is like the following:
>
> CPU 1 CPU 2
>
> btrfs_dev_replace_finishing()
>
> at this point
> dev_replace->tgtdev->devid ==
> BTRFS_DEV_REPLACE_DEVID (0ULL)
>
> ...
>
> btrfs_start_transaction()
> btrfs_commit_transaction()
>
> btrfs_delete_unused_bgs()
> btrfs_remove_chunk()
>
> looks up for the extent map
> corresponding to the chunk
>
> lock_chunks() (chunk_mutex)
> check_system_chunk()
> unlock_chunks() (chunk_mutex)
>
> locks fs_info->chunk_mutex
>
> btrfs_dev_replace_update_device_in_mapping_tree()
> --> iterates fs_info->mapping_tree and
> replaces the device in every extent
> map's map->stripes[] with
> dev_replace->tgtdev, which still has
> an id of 0ULL (BTRFS_DEV_REPLACE_DEVID)
>
> iterates over all stripes from
> the extent map
>
> --> calls btrfs_free_dev_extent()
> passing it the target device
> that still has an ID of 0ULL
>
> --> btrfs_free_dev_extent() fails
> --> aborts current transaction
>
> finishes setting up the target device,
> namely it sets tgtdev->devid to the value
> of srcdev->devid (which is necessarily > 0)
>
> frees the srcdev
>
> unlocks fs_info->chunk_mutex
>
> So fix this by taking the device list mutex while processing the stripes
> for the chunk's extent map. This is similar to the race between device
> replace and block group creation that was fixed by commit 50460e37186a
> ("Btrfs: fix race when finishing dev replace leading to transaction abort").
>
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
Reviewed-by: Josef Bacik <jbacik@xxxxxx>
Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html