On 01/10/2019 03:48 AM, Filipe Manana wrote:
On Wed, Jan 9, 2019 at 6:27 PM David Sterba <dsterba@xxxxxxx> wrote:
On Thu, Dec 13, 2018 at 09:17:25PM +0000, fdmanana@xxxxxxxxxx wrote:
- if (list_empty(&fs_devices->resized_devices))
- return;
-
- mutex_lock(&fs_devices->device_list_mutex);
mutex_lock(&fs_info->chunk_mutex);
list_for_each_entry_safe(curr, next, &fs_devices->resized_devices,
resized_list) {
@@ -7309,7 +7306,6 @@ void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info)
curr->commit_total_bytes = curr->disk_total_bytes;
I'm not sure about removing the device_list_mutex that's said to protect
the commit_total_bytes (comment in struct btrfs_device).
Otherwise the logic is ok, the double lock could happen as you describe.
btrfs_update_commit_device_size is called from btrfs_commit_transaction,
at the same time as commit_bytes_used. The latter is handled in a
similar way in btrfs_update_commit_device_bytes_used, but does not take
the device_list_mutex.
commit_total_bytes is checked several times (eg. in write_dev_supers) to
see if writing the superblock copy is still within the device range.
So, without the protected change, it's theoretically possible that a
stale value is used for the test and the superblock is either written
though it should not, and the other way around.
This would require a resize racing at the time of the check. Grow and
shrink seem to take chunk_mutex while adjusting all the total/size
values, but it's not actually easy to follow as sometimes there are
helpers like btrfs_device_set_total_bytes used and sometimes it's direct
access.
That the device_list_mutex can be safely dropped probably follows from
the simple fact that btrfs_update_commit_device_bytes_used is called
before write_dev_supers in the same context.
But this sounds too simple, given that there are locks taken and
released and btrfs_write_and_wait_transaction called between.
Regardless of all that (and honestly I haven't double checked and
skimmed only through what you said),
there's a more important aspect I missed before: write_all_supers()
takes (and needs) the device list mutex,
therefore this change won't fix the deadlock because of that.
Though this won't fix the problem, this patch is still ok
as its drops the unnecessary device_list_mutex in
btrfs_update_commit_device_size(). So for that if the change log
updated,
Reviewed-by: Anand Jain <anand.jain@xxxxxxxxxx>
To address the actual problem.
Functions which call btrfs_alloc_device() are..
device_list_add()
close_fs_devices()
btrfs_init_dev_replace_tgtdev()
clone_fs_devices()
btrfs_init_new_device()
Now among the above following holds the device_list_mutex when calling
btrfs_alloc_device()
device_list_add()
close_fs_devices()
clone_fs_devices()
Now among above three, the lock at device_list_add() and
clone_fs_devices() can be ignored, because the reclaim or flush IO can't
take place on these FSID:devices as device_list_add() is called when FS
is not-mounted or in the mounting context, and clone_fs_devices() is
called when the SEED device is still a read-only FS.
And so we have to only worry about close_fs_devices().
close_fs_devices() - I didn't like the way it does, that is allocate a
new struct btrfs_device instead of just zero-ing the struct btrfs_device
during unmount. I guess it was done to avoid RCU warning (not sure) and
if it isn't real issues I am happy to see btrfs_device_alloc() is
dropped in close_fs_devices(). Which means it also fixes the problem
that - you need more memory to unmount an ideal FS.
Thanks.
thanks
Referencing this code:
2201 btrfs_update_commit_device_size(fs_info);
2202 btrfs_update_commit_device_bytes_used(cur_trans);
2203
2204 clear_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags);
2205 clear_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags);
2206
2207 btrfs_trans_release_chunk_metadata(trans);
2208
2209 spin_lock(&fs_info->trans_lock);
2210 cur_trans->state = TRANS_STATE_UNBLOCKED;
2211 fs_info->running_transaction = NULL;
2212 spin_unlock(&fs_info->trans_lock);
2213 mutex_unlock(&fs_info->reloc_mutex);
2214
2215 wake_up(&fs_info->transaction_wait);
2216
2217 ret = btrfs_write_and_wait_transaction(trans);
2218 if (ret) {
2219 btrfs_handle_fs_error(fs_info, ret,
2220 "Error while writing out transaction");
2221 mutex_unlock(&fs_info->tree_log_mutex);
2222 goto scrub_continue;
2223 }
2224
2225 ret = write_all_supers(fs_info, 0);