On Thu, Sep 3, 2015 at 2:05 AM, Justin Maggard <jmaggard10@xxxxxxxxx> wrote:
> v2: Fix stupid error while making formatting changes...
>
> I was hitting a consistent NULL pointer dereference during shutdown that
> showed the trace running through end_workqueue_bio(). I traced it back to
> the endio_meta_workers workqueue being poked after it had already been
> destroyed.
>
> Eventually I found that the root cause was a qgroup rescan that was still
> in progress while we were stopping all the btrfs workers.
>
> Currently we explicitly pause balance and scrub operations in
> close_ctree(), but we do nothing to stop the qgroup rescan. We should
> probably be doing the same for qgroup rescan, but that's a much larger
> change. This small change is good enough to allow me to unmount without
> crashing.
>
> Signed-off-by: Justin Maggard <jmaggard@xxxxxxxxxxx>
> ---
> fs/btrfs/qgroup.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index d904ee1..5bfcee9 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2278,7 +2278,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
> goto out;
>
> err = 0;
> - while (!err) {
> + while (!err && !btrfs_fs_closing(fs_info)) {
> trans = btrfs_start_transaction(fs_info->fs_root, 0);
> if (IS_ERR(trans)) {
> err = PTR_ERR(trans);
> @@ -2301,7 +2301,8 @@ out:
> btrfs_free_path(path);
>
> mutex_lock(&fs_info->qgroup_rescan_lock);
> - fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> + if (!btrfs_fs_closing(fs_info))
> + fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>
> if (err > 0 &&
> fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
> @@ -2330,7 +2331,9 @@ out:
> }
> btrfs_end_transaction(trans, fs_info->quota_root);
>
> - if (err >= 0) {
> + if (btrfs_fs_closing(fs_info)) {
> + btrfs_info(fs_info, "qgroup scan paused");
> + } else if (err >= 0) {
> btrfs_info(fs_info, "qgroup scan completed%s",
> err > 0 ? " (inconsistency flag cleared)" : "");
> } else {
Justin, this is still racy (however much less racy than before).
Once we leave the loop because of the condition
btrfs_fs_closing(fs_info), we start a transaction and do some write
operation on the quota btree. While or before we do such write
operation, close_ctree() might have completed or be at a point where
such write operation will result in another null pointer dereference,
or accessing some dangling pointer, or leak a transaction that never
gets committed (because close_ctree() already stopped the transaction
kthread), etc, etc.
So in addition to what you did, you need to call
btrfs_qgroup_wait_for_completion(fs_info) at disk-io.c:close_ctree()
right after setting fs_info->closing to 1.
Otherwise it looks good.
Thanks.
> --
> 2.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html