On Mon, Jun 03, 2019 at 02:53:00PM +0800, Qu Wenruo wrote:
>
>
> On 2019/2/13 上午12:03, David Sterba wrote:
> > On Thu, Jan 24, 2019 at 09:31:43AM -0500, Josef Bacik wrote:
> >> Previously callers to btrfs_end_transaction_throttle() would commit the
> >> transaction if there wasn't enough delayed refs space. This happens in
> >> relocation, and if the fs is relatively empty we'll run out of delayed
> >> refs space basically immediately, so we'll just be stuck in this loop of
> >> committing the transaction over and over again.
> >>
> >> This code existed because we didn't have a good feedback mechanism for
> >> running delayed refs, but with the delayed refs rsv we do now. Delete
> >> this throttling code and let the btrfs_start_transaction() in relocation
> >> deal with putting pressure on the delayed refs infrastructure. With
> >> this patch we no longer take 5 minutes to balance a metadata only fs.
> >>
> >> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> >
> > For the record, this has been merged to 5.0-rc5
> >
>
> Bisecting leads me to this patch for strange balance ENOSPC.
>
> Can be reproduced by btrfs/156, or the following small script:
> ------
> #!/bin/bash
> dev="/dev/test/test"
> mnt="/mnt/btrfs"
>
> _fail()
> {
> echo "!!! FAILED: $@ !!!"
> exit 1
> }
>
> do_work()
> {
> umount $dev &> /dev/null
> umount $mnt &> /dev/null
>
> mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null
>
> mount $dev $mnt
>
> for i in $(seq -w 0 511); do
> # xfs_io -f -c "falloc 0 1m" $mnt/file_$i > /dev/null
> xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
> done
> sync
>
> btrfs balance start --full $mnt || return 1
> sync
>
>
> btrfs balance start --full $mnt || return 1
> umount $mnt
> }
>
> failed=0
> for i in $(seq -w 0 24); do
> echo "=== run $i ==="
> do_work
> if [ $? -eq 1 ]; then
> failed=$(($failed + 1))
> fi
> done
> if [ $failed -ne 0 ]; then
> echo "!!! failed $failed/25 !!!"
> else
> echo "=== all passes ==="
> fi
> ------
>
> For v4.20, it will fail at the rate around 0/25 ~ 2/25 (very rare).
> But at that patch (upstream commit
> 302167c50b32e7fccc98994a91d40ddbbab04e52), the failure rate raise to 25/25.
>
> Any idea for that ENOSPC problem?
> As it looks really wired for the 2nd full balance to fail even we have
> enough unallocated space.
>
I've been running this all morning on kdave's misc-next and not had a single
failure. I ran it a few times on spinning rust and a few times on my nvme
drive. I wouldn't doubt that it's failing for you, but I can't reproduce. It
would be helpful to know where the ENOSPC was coming from so I can think of
where the problem might be. Thanks,
Josef