Re: [PATCH] btrfs: don't end the transaction for delayed refs in throttle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/6/4 上午1:36, Josef Bacik wrote:
> On Mon, Jun 03, 2019 at 02:53:00PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/2/13 上午12:03, David Sterba wrote:
>>> On Thu, Jan 24, 2019 at 09:31:43AM -0500, Josef Bacik wrote:
>>>> Previously callers to btrfs_end_transaction_throttle() would commit the
>>>> transaction if there wasn't enough delayed refs space.  This happens in
>>>> relocation, and if the fs is relatively empty we'll run out of delayed
>>>> refs space basically immediately, so we'll just be stuck in this loop of
>>>> committing the transaction over and over again.
>>>>
>>>> This code existed because we didn't have a good feedback mechanism for
>>>> running delayed refs, but with the delayed refs rsv we do now.  Delete
>>>> this throttling code and let the btrfs_start_transaction() in relocation
>>>> deal with putting pressure on the delayed refs infrastructure.  With
>>>> this patch we no longer take 5 minutes to balance a metadata only fs.
>>>>
>>>> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
>>>
>>> For the record, this has been merged to 5.0-rc5
>>>
>>
>> Bisecting leads me to this patch for strange balance ENOSPC.
>>
>> Can be reproduced by btrfs/156, or the following small script:
>> ------
>> #!/bin/bash
>> dev="/dev/test/test"
>> mnt="/mnt/btrfs"
>>
>> _fail()
>> {
>> 	echo "!!! FAILED: $@ !!!"
>> 	exit 1
>> }
>>
>> do_work()
>> {
>> 	umount $dev &> /dev/null
>> 	umount $mnt &> /dev/null
>>
>> 	mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null
>>
>> 	mount $dev $mnt
>>
>> 	for i in $(seq -w 0 511); do
>> 	#	xfs_io -f -c "falloc 0 1m" $mnt/file_$i > /dev/null
>> 		xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
>> 	done
>> 	sync
>>
>> 	btrfs balance start --full $mnt || return 1
>> 	sync
>>
>>
>> 	btrfs balance start --full $mnt || return 1
>> 	umount $mnt
>> }
>>
>> failed=0
>> for i in $(seq -w 0 24); do
>> 	echo "=== run $i ==="
>> 	do_work
>> 	if [ $? -eq 1 ]; then
>> 		failed=$(($failed + 1))
>> 	fi
>> done
>> if [ $failed -ne 0 ]; then
>> 	echo "!!! failed $failed/25 !!!"
>> else
>> 	echo "=== all passes ==="
>> fi
>> ------
>>
>> For v4.20, it will fail at the rate around 0/25 ~ 2/25 (very rare).
>> But at that patch (upstream commit
>> 302167c50b32e7fccc98994a91d40ddbbab04e52), the failure rate raise to 25/25.
>>
>> Any idea for that ENOSPC problem?
>> As it looks really wired for the 2nd full balance to fail even we have
>> enough unallocated space.
>>
> 
> I've been running this all morning on kdave's misc-next and not had a single
> failure.  I ran it a few times on spinning rust and a few times on my nvme
> drive.  I wouldn't doubt that it's failing for you, but I can't reproduce.  It
> would be helpful to know where the ENOSPC was coming from so I can think of
> where the problem might be.  Thanks,
> 
> Josef
> 

Since v5.2-rc2 has a lot of enospc debug output merged, here is the
debug info just by enospc_debug:

BTRFS: device fsid defe70f2-d083-41f0-a4fd-28a0cc03dce7 devid 1 transid
5 /dev/test/test
BTRFS info (device dm-3): disk space caching is enabled
BTRFS info (device dm-3): has skinny extents
BTRFS info (device dm-3): flagging fs with big metadata feature
BTRFS info (device dm-3): checking UUID tree
BTRFS info (device dm-3): balance: start -d -m -s
BTRFS info (device dm-3): relocating block group 726663168 flags metadata
BTRFS info (device dm-3): relocating block group 609222656 flags data
BTRFS info (device dm-3): found 57 extents
BTRFS info (device dm-3): found 57 extents
BTRFS info (device dm-3): relocating block group 491782144 flags data
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): relocating block group 374341632 flags data
BTRFS info (device dm-3): found 115 extents
BTRFS info (device dm-3): found 114 extents
BTRFS info (device dm-3): relocating block group 256901120 flags data
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): relocating block group 139460608 flags data
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): unable to make block group 22020096 ro
BTRFS info (device dm-3): sinfo_used=42909696 bg_num_bytes=116293632
min_allocable=1048576
BTRFS info (device dm-3): space_info 4 has 82919424 free, is not full
BTRFS info (device dm-3): space_info total=125829120, used=1015808,
pinned=0, reserved=81920, may_use=41746432, readonly=65536
BTRFS info (device dm-3): global_block_rsv: size 16777216 reserved 16744448
BTRFS info (device dm-3): trans_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): chunk_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_refs_rsv: size 61865984 reserved 25001984
BTRFS info (device dm-3): relocating block group 22020096 flags metadata
BTRFS info (device dm-3): found 54 extents
BTRFS info (device dm-3): relocating block group 13631488 flags data
BTRFS info (device dm-3): found 8 extents
BTRFS info (device dm-3): found 8 extents
BTRFS info (device dm-3): relocating block group 5242880 flags metadata
BTRFS info (device dm-3): found 56 extents
BTRFS info (device dm-3): unable to make block group 1048576 ro
BTRFS info (device dm-3): sinfo_used=32768 bg_num_bytes=4161536
min_allocable=1048576
BTRFS info (device dm-3): space_info 2 has 4161536 free, is not full
BTRFS info (device dm-3): space_info total=4194304, used=16384,
pinned=0, reserved=16384, may_use=0, readonly=0
BTRFS info (device dm-3): global_block_rsv: size 16777216 reserved 16744448
BTRFS info (device dm-3): trans_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): chunk_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_refs_rsv: size 3145728 reserved 1540096
BTRFS info (device dm-3): relocating block group 1048576 flags system
BTRFS info (device dm-3): found 1 extents
BTRFS info (device dm-3): balance: ended with status: 0
BTRFS info (device dm-3): balance: start -d -m -s
BTRFS info (device dm-3): unable to make block group 1431306240 ro
BTRFS info (device dm-3): sinfo_used=16384 bg_num_bytes=33538048
min_allocable=1048576
BTRFS info (device dm-3): space_info 2 has 33538048 free, is not full
BTRFS info (device dm-3): space_info total=33554432, used=16384,
pinned=0, reserved=0, may_use=0, readonly=0
BTRFS info (device dm-3): global_block_rsv: size 16777216 reserved 16777216
BTRFS info (device dm-3): trans_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): chunk_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_refs_rsv: size 0 reserved 0
BTRFS info (device dm-3): relocating block group 1431306240 flags system
BTRFS info (device dm-3): unable to make block group 1313865728 ro
BTRFS info (device dm-3): sinfo_used=19382272 bg_num_bytes=116342784
min_allocable=1048576
BTRFS info (device dm-3): space_info 4 has 98058240 free, is not full
BTRFS info (device dm-3): space_info total=117440512, used=1015808,
pinned=0, reserved=81920, may_use=18284544, readonly=0
BTRFS info (device dm-3): global_block_rsv: size 16777216 reserved 16744448
BTRFS info (device dm-3): trans_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): chunk_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_refs_rsv: size 3145728 reserved 1540096
BTRFS info (device dm-3): relocating block group 1313865728 flags metadata
BTRFS info (device dm-3): found 55 extents
BTRFS info (device dm-3): relocating block group 1196425216 flags data
BTRFS info (device dm-3): found 65 extents
BTRFS info (device dm-3): found 65 extents
BTRFS warning (device dm-3): no space to allocate a new chunk for block
group 1078984704
BTRFS warning (device dm-3): no space to allocate a new chunk for block
group 961544192
BTRFS warning (device dm-3): no space to allocate a new chunk for block
group 844103680
BTRFS warning (device dm-3): no space to allocate a new chunk for block
group 726663168
BTRFS info (device dm-3): 4 enospc errors during balance
BTRFS info (device dm-3): balance: ended with status: -28

The ENOSPC is still from inc_block_group_ro().
For the first data block failure, it still looks like something in
bytes_may_use doesn't look correct.

Although for system block group, it's another story. It has no
reserved/pinned/may_use bytes, it's the min_allocable failing the check.

I can't remember if this is some bug I reported before, but looks a
little similar.

Thanks,
Qu

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux