On 2019/1/14 下午5:35, David Sterba wrote:
> On Mon, Jan 14, 2019 at 01:39:46PM +0800, Qu Wenruo wrote:
>> Hi,
>>
>> When rebasing my qgroup + balance optimization patches, I found one very
>> obvious performance regression for balance.
>>
>> For normal 4G subvolume, 16 snapshots, balance workload, v4.20 kernel
>> only takes 3s to relocate a metadata block group, while for v5.0-rc1, I
>> don't really know how it will take as it hasn't finished yet.
>
> This looks like a lockup, unbounded waiting or missed wakeup.
Nope.
It's committing transaction like crazy.
With much smaller dataset, it in fact could finish, while v4.20 could
finish just in senconds, v5.0-rc1 finish in near 400 seconds.
And during that 400 seconds, btrfs commits itself for over 2000 times.
>
>> And the most important part is, this happens when quota is *DISABLED*!!!
>>
>> I'm bisecting for this regression, but if there are some users trying
>> latest rc kernel, please be aware of this regression.
>
> The rc1 can go pretty wild and issues could be caused by other
> subsystems, so I'd try to test the merged (32ee34eddad13cd4) and
> non-merged (52042d8e82ff50d) branches, this should tell you if it's a
> genuine btrfs bug or not.
I have already bisect the bug, it's 64403612b73a ("btrfs: rework
btrfs_check_space_for_delayed_refs").
And further more, I sumitted an RFC patch for fstests, which everyone
could test without using the uncertain contains from '/usr'.
https://patchwork.kernel.org/patch/10761715/
This turns out to be several change in relocation at least.
If we don't do snapshots, just one subvolume with just several megabytes
metadata to relocate, it just returns ENOSPC.
With enough snapshots, it commits like crazy.
The bisect is based on relocation duration, haven't digged deep enough
to make a judge on the ENOSPC behavior yet.
Thanks,
Qu
Attachment:
signature.asc
Description: OpenPGP digital signature
