On Mon, Jan 14, 2019 at 04:21:43PM +0800, Anand Jain wrote:
>
>
> On 01/12/2019 01:17 AM, fdmanana@xxxxxxxxxx wrote:
> > From: Filipe Manana <fdmanana@xxxxxxxx>
> >
> > In a few places we are allocating a device using the GFP_KERNEL flag when
> > it is not safe to do so, because if reclaim is triggered it can cause a
> > transaction commit while we are holding the device list mutex. This mutex
> > is required in the transaction commit path (at write_all_supers() and
> > btrfs_update_commit_device_size()).
> >
> > So fix this by setting up a nofs memory allocation context in those cases.
> >
> > Fixes: 78f2c9e6dbb14 ("btrfs: device add and remove: use GFP_KERNEL")
> > Fixes: e0ae999414238 ("btrfs: preallocate device flush bio")
> > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> > ---
> >
> > V2: Change the approach to fix the problem by setting up nofs contextes
> > where needed.
> >
> > fs/btrfs/volumes.c | 33 ++++++++++++++++++++++++++++++---
> > 1 file changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 2576b1a379c9..663566baae78 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -14,6 +14,7 @@
> > #include <linux/semaphore.h>
> > #include <linux/uuid.h>
> > #include <linux/list_sort.h>
> > +#include <linux/sched/mm.h>
> > #include "ctree.h"
> > #include "extent_map.h"
> > #include "disk-io.h"
> > @@ -988,20 +989,29 @@ static noinline struct btrfs_device *device_list_add(const char *path,
> > }
> >
> > if (!device) {
> > + unsigned int nofs_flag;
> > +
> > if (fs_devices->opened) {
> > mutex_unlock(&fs_devices->device_list_mutex);
> > return ERR_PTR(-EBUSY);
> > }
> >
> > + /*
> > + * Setup nofs context because we are holding the device list
> > + * mutex, which is required for a transaction commit.
> > + */
>
> I wonder if there is a bug due to GFP_KERNEL in device_list_add()?
> as device_list_add() can only be called only when the FSID is not yet
> mounted. OR if its done for the sake of consistency when calling\
> btrfs_alloc_device().
It still could be called but a new device will not be allocated, all is
done either via scan or during mount. A missing device has an entry in
fs_devices.
We can keep th NOFS protection around that to make it future-proof, as
it's not trivial to see if this is always called from safe context or
not.