Re: [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 21, 2020 at 01:44:25PM +0300, Nikolay Borisov wrote:
> 
> 
> On 17.04.20 г. 0:46 ч., Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@xxxxxx>
> > 
> > In btrfs_submit_direct(), if we fail to allocate the btrfs_dio_private,
> > we complete the ordered extent range. However, we don't mark that the
> > range doesn't need to be cleaned up from btrfs_direct_IO() until later.
> > Therefore, if we fail to allocate the btrfs_dio_private, we complete the
> > ordered extent range twice. We could fix this by updating
> > unsubmitted_oe_range earlier, but it's cleaner to reorganize the code so
> > that creating the btrfs_dio_private and submitting the bios are
> > separate, and once the btrfs_dio_private is created, cleanup always
> > happens through the btrfs_dio_private.
> > 
> > Fixes: f28a49287817 ("Btrfs: fix leaking of ordered extents after direct IO write error")
> > Signed-off-by: Omar Sandoval <osandov@xxxxxx>
> 
> Generally looks ok, so :
> 
> Reviewed-by: Nikolay Borisov <nborisov@xxxxxxxx>, however please see
> below for some remarks on the logic around unsubmitted_oe_range_start/end
> 
> > ---
> >  fs/btrfs/inode.c | 174 ++++++++++++++++++-----------------------------
> >  1 file changed, 66 insertions(+), 108 deletions(-)
> > 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index b628c319a5b6..f6ce9749adb6 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -7903,14 +7903,60 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
> >  	return ret;
> >  }
> >  
> > -static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
> > +/*
> > + * If this succeeds, the btrfs_dio_private is responsible for cleaning up locked
> > + * or ordered extents whether or not we submit any bios.
> > + */
> > +static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
> > +							  struct inode *inode,
> > +							  loff_t file_offset)
> >  {
> > -	struct inode *inode = dip->inode;
> > +	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
> > +	struct btrfs_dio_private *dip;
> > +	struct bio *bio;
> > +
> > +	dip = kzalloc(sizeof(*dip), GFP_NOFS);
> > +	if (!dip)
> > +		return NULL;
> > +
> > +	bio = btrfs_bio_clone(dio_bio);
> > +	bio->bi_private = dip;
> > +	btrfs_io_bio(bio)->logical = file_offset;
> > +
> > +	dip->private = dio_bio->bi_private;
> > +	dip->inode = inode;
> > +	dip->logical_offset = file_offset;
> > +	dip->bytes = dio_bio->bi_iter.bi_size;
> > +	dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
> > +	dip->orig_bio = bio;
> > +	dip->dio_bio = dio_bio;
> > +	atomic_set(&dip->pending_bios, 1);
> > +
> > +	if (write) {
> > +		struct btrfs_dio_data *dio_data = current->journal_info;
> > +
> > +		dio_data->unsubmitted_oe_range_end = dip->logical_offset +
> > +			dip->bytes;
> > +		dio_data->unsubmitted_oe_range_start =
> > +			dio_data->unsubmitted_oe_range_end;
> 
> The logic around those 2 members is really subtle. We really have the
> following:
> 
> 1. btrfs_direct_IO sets those two to the same value.
> 
> 2. When we call __blockdev_direct_IO unless
> btrfs_get_blocks_direct->btrfs_get_blocks_direct_write is called to
> modify unsubmitted_oe_range_start so that start < end. Cleanup won't
> happen.
> 
> 3. We come into btrfs_submit_direct - if it dip allocation fails we'd
> return with oe_range_end now modified so cleanup will happen.
> 
> 4. If we manage to allocate the dip we reset the unsubmitted range
> members to be equal so that cleanup happens from btrfs_endio_direct_write.
> 
> This 4-step logic is not really obvious, especially given it's scattered
> across 3 functions. Perhaps a comment saying that having the 2 members
> being equal means cleanup in btrfs_DIRECT_io is disabled.
> 
> I'd rather have it spelled out in the changelog, I guess David can also
> do that ?

The above added to changelog and a brief comment added, thanks.



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux