On Thu, Jun 27, 2019 at 10:47:31PM +0100, Filipe Manana wrote:
> On Thu, Jun 27, 2019 at 9:28 PM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> >
> > On Thu, Jun 27, 2019 at 06:00:30PM +0100, fdmanana@xxxxxxxxxx wrote:
> > > From: Filipe Manana <fdmanana@xxxxxxxx>
> > >
> > > Test that if we clone a file with some large extents into a file that has
> > > many small extents, when the fs is nearly full, the clone operation does
> > > not fail and produces the correct result.
> > >
> > > This is motivated by a bug found in btrfs wich is fixed by the following
> > > patches for the linux kernel:
> > >
> > > [PATCH 1/2] Btrfs: factor out extent dropping code from hole punch handler
> > > [PATCH 2/2] Btrfs: fix ENOSPC errors, leading to transaction aborts, when
> > > cloning extents
> > >
> > > The test currently passes on xfs.
> > >
> > > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> > > ---
> > > tests/generic/558 | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > tests/generic/558.out | 5 ++++
> > > tests/generic/group | 1 +
> > > 3 files changed, 81 insertions(+)
> > > create mode 100755 tests/generic/558
> > > create mode 100644 tests/generic/558.out
> > >
> > > diff --git a/tests/generic/558 b/tests/generic/558
> > > new file mode 100755
> > > index 00000000..ee16cdf7
> > > --- /dev/null
> > > +++ b/tests/generic/558
> > > @@ -0,0 +1,75 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
> > > +#
> > > +# FSQA Test No. 558
> > > +#
> > > +# Test that if we clone a file with some large extents into a file that has
> > > +# many small extents, when the fs is nearly full, the clone operation does
> > > +# not fail and produces the correct result.
> > > +#
> > > +seq=`basename $0`
> > > +seqres=$RESULT_DIR/$seq
> > > +echo "QA output created by $seq"
> > > +tmp=/tmp/$$
> > > +status=1 # failure is the default!
> > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > +
> > > +_cleanup()
> > > +{
> > > + cd /
> > > + rm -f $tmp.*
> > > +}
> > > +
> > > +# get standard environment, filters and checks
> > > +. ./common/rc
> > > +. ./common/filter
> > > +. ./common/reflink
> > > +
> > > +# real QA test starts here
> > > +_supported_fs generic
> > > +_supported_os Linux
> > > +_require_scratch_reflink
> > > +
> > > +rm -f $seqres.full
> > > +
> > > +_scratch_mkfs_sized $((512 * 1024 * 1024)) >>$seqres.full 2>&1
> > > +_scratch_mount
> > > +
> > > +file_size=$(( 128 * 1024 * 1024 )) # 128Mb
> > > +extent_size=4096
> >
> > What if the fs block size is 64k?
>
> Then we get extents of 64Kb instead of 4Kb. Works on btrfs. Is it a
> problem for xfs (or any other fs)?
It shouldn't be; I was merely wondering if 2048 extents was enough to
trigger the enospc on btrfs.
> >
> > > +num_extents=$(( $file_size / $extent_size ))
> > > +
> > > +# Create a file with many small extents.
> > > +for ((i = 0; i < $num_extents; i++)); do
> > > + offset=$(( $i * $extent_size ))
> > > + $XFS_IO_PROG -f -s -c "pwrite -S 0xe5 $offset $extent_size" \
> > > + $SCRATCH_MNT/foo >>/dev/null
> > > +done
> >
> > I wouldn't have thought that this would actually succeed on xfs because
> > you could lay extents down one after the other, but then started seeing
> > this in the filefrag output:
> >
> > File size of /opt/foo is 528384 (129 blocks of 4096 bytes)
> > ext: logical_offset: physical_offset: length: expected: flags:
> > 3: 17.. 17: 52.. 52: 1: 37:
> > 4: 18.. 18: 67.. 67: 1: 53:
> > 5: 19.. 19: 81.. 81: 1: 68:
> > 6: 20.. 20: 94.. 94: 1: 82:
> > 7: 21.. 21: 106.. 106: 1: 95:
> > 8: 22.. 22: 117.. 117: 1: 107:
> > 9: 23.. 23: 127.. 127: 1: 118:
> > 10: 24.. 24: 136.. 136: 1: 128:
> > 11: 25.. 25: 144.. 144: 1: 137:
> > 12: 26.. 26: 151.. 151: 1: 145:
> > 13: 27.. 27: 157.. 157: 1: 152:
> > 14: 28.. 28: 162.. 162: 1: 158:
> > 15: 29.. 29: 166.. 166: 1: 163:
> > 16: 30.. 30: 169.. 169: 1: 167:
> > 17: 31.. 32: 171.. 172: 2: 170:
> > 18: 33.. 33: 188.. 188: 1: 173:
> >
> > 52, 67, 81, 94, 106, 117, 127, 136, 44, 151, 157, 162, 166, 169, 171...
> >
> > +15 +14 +13 +12 +11 +10 +9 +8 +7 +6 +5 +4 +3 +2...
> >
> > Hm, I wonder what quirk of the xfs allocator this might be?
>
> Don't ask me, I have no idea how it works :)
Ah, it's because xfs frees post-eof allocations every time we close a
file... and I never got the series that fixes this finalized and sent
upstream.
> >
> > > +
> > > +# Create file bar with the same size that file foo has but with large extents.
> > > +$XFS_IO_PROG -f -c "pwrite -S 0xc7 -b $file_size 0 $file_size" \
> > > + $SCRATCH_MNT/bar >>/dev/null
> > > +
> > > +# Fill the fs (for btrfs we are interested in filling all unallocated space
> > > +# and most of the existing metadata block group(s), so that after this there
> > > +# will be no unallocated space and metadata space will be mostly full but with
> > > +# more than enough free space for the clone operation below to succeed).
> > > +i=1
> > > +while true; do
> > > + $XFS_IO_PROG -f -c "pwrite 0 2K" $SCRATCH_MNT/filler_$i &> /dev/null
> > > + [ $? -ne 0 ] && break
> > > + i=$(( i + 1 ))
> > > +done
> >
> > _fill_fs?
>
> No, because that will do fallocate. Writing 2Kb files here is on
> purpose because that will fill the metadata
> block group(s) (on btrfs we inline extents in the btree (metadata) as
> long as they are not bigger than 2Kb, by default).
> Using fallocate, we end up using data block groups.
Ok.
> >
> > > +
> > > +# Now clone file bar into file foo. This is supposed to succeed and not fail
> > > +# with ENOSPC for example.
> > > +$XFS_IO_PROG -c "reflink $SCRATCH_MNT/bar" $SCRATCH_MNT/foo >>/dev/null
> >
> > _reflink $SCRATCH_MNT/bar $SCRATCH_MNT/foo ?
>
> Does exactly the same. I don't mind changing it however.
<Nod>
--D
> Thanks.
>
> >
> > --D
> >
> > > +
> > > +_scratch_remount
> > > +
> > > +echo "File foo data after cloning and remount:"
> > > +od -A d -t x1 $SCRATCH_MNT/foo
> > > +
> > > +status=0
> > > +exit
> > > diff --git a/tests/generic/558.out b/tests/generic/558.out
> > > new file mode 100644
> > > index 00000000..d1e8e70f
> > > --- /dev/null
> > > +++ b/tests/generic/558.out
> > > @@ -0,0 +1,5 @@
> > > +QA output created by 558
> > > +File foo data after cloning and remount:
> > > +0000000 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7
> > > +*
> > > +134217728
> > > diff --git a/tests/generic/group b/tests/generic/group
> > > index 543c0627..c06c1cd1 100644
> > > --- a/tests/generic/group
> > > +++ b/tests/generic/group
> > > @@ -560,3 +560,4 @@
> > > 555 auto quick cap
> > > 556 auto quick casefold
> > > 557 auto quick log
> > > +558 auto clone
> > > --
> > > 2.11.0
> > >