On Tue, Apr 21, 2020 at 01:44:25PM +0300, Nikolay Borisov wrote:
>
>
> On 17.04.20 г. 0:46 ч., Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@xxxxxx>
> >
> > In btrfs_submit_direct(), if we fail to allocate the btrfs_dio_private,
> > we complete the ordered extent range. However, we don't mark that the
> > range doesn't need to be cleaned up from btrfs_direct_IO() until later.
> > Therefore, if we fail to allocate the btrfs_dio_private, we complete the
> > ordered extent range twice. We could fix this by updating
> > unsubmitted_oe_range earlier, but it's cleaner to reorganize the code so
> > that creating the btrfs_dio_private and submitting the bios are
> > separate, and once the btrfs_dio_private is created, cleanup always
> > happens through the btrfs_dio_private.
> >
> > Fixes: f28a49287817 ("Btrfs: fix leaking of ordered extents after direct IO write error")
> > Signed-off-by: Omar Sandoval <osandov@xxxxxx>
>
> Generally looks ok, so :
>
> Reviewed-by: Nikolay Borisov <nborisov@xxxxxxxx>, however please see
> below for some remarks on the logic around unsubmitted_oe_range_start/end
>
> > ---
> > fs/btrfs/inode.c | 174 ++++++++++++++++++-----------------------------
> > 1 file changed, 66 insertions(+), 108 deletions(-)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index b628c319a5b6..f6ce9749adb6 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -7903,14 +7903,60 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
> > return ret;
> > }
> >
> > -static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
> > +/*
> > + * If this succeeds, the btrfs_dio_private is responsible for cleaning up locked
> > + * or ordered extents whether or not we submit any bios.
> > + */
> > +static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
> > + struct inode *inode,
> > + loff_t file_offset)
> > {
> > - struct inode *inode = dip->inode;
> > + const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
> > + struct btrfs_dio_private *dip;
> > + struct bio *bio;
> > +
> > + dip = kzalloc(sizeof(*dip), GFP_NOFS);
> > + if (!dip)
> > + return NULL;
> > +
> > + bio = btrfs_bio_clone(dio_bio);
> > + bio->bi_private = dip;
> > + btrfs_io_bio(bio)->logical = file_offset;
> > +
> > + dip->private = dio_bio->bi_private;
> > + dip->inode = inode;
> > + dip->logical_offset = file_offset;
> > + dip->bytes = dio_bio->bi_iter.bi_size;
> > + dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
> > + dip->orig_bio = bio;
> > + dip->dio_bio = dio_bio;
> > + atomic_set(&dip->pending_bios, 1);
> > +
> > + if (write) {
> > + struct btrfs_dio_data *dio_data = current->journal_info;
> > +
> > + dio_data->unsubmitted_oe_range_end = dip->logical_offset +
> > + dip->bytes;
> > + dio_data->unsubmitted_oe_range_start =
> > + dio_data->unsubmitted_oe_range_end;
>
> The logic around those 2 members is really subtle. We really have the
> following:
>
> 1. btrfs_direct_IO sets those two to the same value.
>
> 2. When we call __blockdev_direct_IO unless
> btrfs_get_blocks_direct->btrfs_get_blocks_direct_write is called to
> modify unsubmitted_oe_range_start so that start < end. Cleanup won't
> happen.
>
> 3. We come into btrfs_submit_direct - if it dip allocation fails we'd
> return with oe_range_end now modified so cleanup will happen.
>
> 4. If we manage to allocate the dip we reset the unsubmitted range
> members to be equal so that cleanup happens from btrfs_endio_direct_write.
>
> This 4-step logic is not really obvious, especially given it's scattered
> across 3 functions. Perhaps a comment saying that having the 2 members
> being equal means cleanup in btrfs_DIRECT_io is disabled.
>
> I'd rather have it spelled out in the changelog, I guess David can also
> do that ?
The above added to changelog and a brief comment added, thanks.