On 2020/2/5 上午11:25, Qu Wenruo wrote: > > > On 2020/2/5 上午11:18, Matt Corallo wrote: >> Hmm? My understanding is that that issue was only visible via stat >> calls, not in actual behavior. In this case, if you have a lot of >> in-flight writes from write cache balance will fail after allocating >> blocks (so I guess balance relies on stat()?) >> >> Also, this is all on a kernel with your previous patch "btrfs: super: >> Make btrfs_statfs() work with metadata over-commiting" applied. > > Oh, sorry, misread some thing. > > Then it's going to be fixed by a patchset: > https://patchwork.kernel.org/project/linux-btrfs/list/?series=229013 Oh, wrong patchset. It's to solve another problem. Not the problem you hit. This is the correct patch set: https://patchwork.kernel.org/project/linux-btrfs/list/?series=229979 Sorry for the inconvinience. Thanks, Qu > > It's relocation space calculation going too paranoid. > > Thanks, > Qu >> >> Thanks, >> Matt >> >> On 2/5/20 1:03 AM, Qu Wenruo wrote: >>> >>> >>> On 2020/2/5 上午2:17, Matt Corallo wrote: >>>> This appears to be some kind of race when there are a lot of pending >>>> metadata writes in flight. >>>> >>>> I went and unmounted/remounted again (after taking about 30 minutes of >>>> 5MB/s writes flushing an rsync with a ton of tiny files) and after the >>>> remount the issue went away again. So I can only presume it is an issue >>>> only when there are a million or so tiny files pending write. >>> >>> Known bug, the upstream fix is d55966c4279b ("btrfs: do not zero >>> f_bavail if we have available space"), and is backported to stable kernels. >>> >>> I guess downstream kernels will soon get updated to fix it. >>> >>> Thanks, >>> Qu >>>> >>>> Matt >>>> >>>> On 2/4/20 3:41 AM, Matt Corallo wrote: >>>>> Things settled a tiny bit after unmount (see last email for the errors >>>>> that generated) and remount, and a balance -mconvert,soft worked: >>>>> >>>>> [268093.588482] BTRFS info (device dm-2): balance: start >>>>> -mconvert=raid1,soft -sconvert=raid1,soft >>>>> ... >>>>> [288405.946776] BTRFS info (device dm-2): balance: ended with status: 0 >>>>> >>>>> However, the enospc issue still appears and seems tied to a few of the >>>>> previously-allocated metadata blocks: >>>>> >>>>> # btrfs balance start -musage=0 /bigraid >>>>> ... >>>>> >>>>> [289714.420418] BTRFS info (device dm-2): balance: start -musage=0 -susage=0 >>>>> [289714.508411] BTRFS info (device dm-2): 64 enospc errors during balance >>>>> [289714.508413] BTRFS info (device dm-2): balance: ended with status: -28 >>>>> >>>>> # cd /sys/fs/btrfs/e2843f83-aadf-418d-b36b-5642f906808f/allocation/ && >>>>> grep -Tr . >>>>> metadata/raid1/used_bytes: 255838797824 >>>>> metadata/raid1/total_bytes: 441307889664 >>>>> metadata/disk_used: 511677595648 >>>>> metadata/bytes_pinned: 0 >>>>> metadata/bytes_used: 255838797824 >>>>> metadata/total_bytes_pinned: 999424 >>>>> metadata/disk_total: 882615779328 >>>>> metadata/total_bytes: 441307889664 >>>>> metadata/bytes_reserved: 4227072 >>>>> metadata/bytes_readonly: 65536 >>>>> metadata/bytes_may_use: 433502945280 >>>>> metadata/flags: 4 >>>>> system/raid1/used_bytes: 1474560 >>>>> system/raid1/total_bytes: 33554432 >>>>> system/disk_used: 2949120 >>>>> system/bytes_pinned: 0 >>>>> system/bytes_used: 1474560 >>>>> system/total_bytes_pinned: 0 >>>>> system/disk_total: 67108864 >>>>> system/total_bytes: 33554432 >>>>> system/bytes_reserved: 0 >>>>> system/bytes_readonly: 0 >>>>> system/bytes_may_use: 0 >>>>> system/flags: 2 >>>>> global_rsv_reserved: 536870912 >>>>> data/disk_used: 13645423230976 >>>>> data/bytes_pinned: 0 >>>>> data/bytes_used: 13645423230976 >>>>> data/single/used_bytes: 13645423230976 >>>>> data/single/total_bytes: 13661217226752 >>>>> data/total_bytes_pinned: 0 >>>>> data/disk_total: 13661217226752 >>>>> data/total_bytes: 13661217226752 >>>>> data/bytes_reserved: 117518336 >>>>> data/bytes_readonly: 196608 >>>>> data/bytes_may_use: 15064711168 >>>>> data/flags: 1 >>>>> global_rsv_size: 536870912 >>>>> >>>>> >>>>> Somewhat more frightening, this also happens on the system blocks: >>>>> >>>>> [288405.946776] BTRFS info (device dm-2): balance: ended with status: 0 >>>>> [289589.506357] BTRFS info (device dm-2): balance: start -musage=5 -susage=5 >>>>> [289589.905675] BTRFS info (device dm-2): relocating block group >>>>> 9676759498752 flags system|raid1 >>>>> [289590.807033] BTRFS info (device dm-2): found 89 extents >>>>> [289591.300212] BTRFS info (device dm-2): 16 enospc errors during balance >>>>> [289591.300216] BTRFS info (device dm-2): balance: ended with status: -28 >>>>> >>>>> Matt >>>>> >>>>> On 2/3/20 9:40 PM, Chris Murphy wrote: >>>>>> A developer might find it useful to see this reproduced with mount >>>>>> option enospc_debug. And soon after enospc the output from: >>>>>> >>>>>> cd /sys/fs/btrfs/UUID/allocation/ && grep -Tr . >>>>>> >>>>>> yep, space then dot at the end >>>>>> >>> >
Attachment:
signature.asc
Description: OpenPGP digital signature
