David Sterba 於 2020-02-12 02:21 寫到:
On Tue, Feb 11, 2020 at 12:33:48PM +0800, Qu Wenruo wrote:
>> 39862272 have 30949376
>> [ 5949.328136] repair_io_failure: 22 callbacks suppressed
>> [ 5949.328139] BTRFS info (device vdb): read error corrected: ino 0
>> off 39862272 (dev /dev/vdd sector 19488)
>> [ 5949.333447] BTRFS info (device vdb): read error corrected: ino 0
>> off 39866368 (dev /dev/vdd sector 19496)
>> [ 5949.336875] BTRFS info (device vdb): read error corrected: ino 0
>> off 39870464 (dev /dev/vdd sector 19504)
>> [ 5949.340325] BTRFS info (device vdb): read error corrected: ino 0
>> off 39874560 (dev /dev/vdd sector 19512)
>> [ 5949.409934] BTRFS warning (device vdb): csum failed root -9 ino 257
>> off 2228224 csum
This looks like an existing bug, IIRC Zygo reported it before.
Btrfs balance just randomly failed at data reloc tree.
Thus I don't believe it's related to Ethan's patches.
Ok, than the patches make it more likely to happen, which could mean
that faster backref processing hits some race window. As there could be
more we should first fix the bug you say Zygo reported.
I added a log to check if find_parent_nodes is ever called under
test btrfs/125. It turns out that btrfs/125 doesn't pass through the
function. What my patches do is all under find_parent_nodes.
Therefore, I don't think my patch would make btrfs/125 more likely
to happen, at least it doesn't change the behavior of functions
btrfs/125 run through.
Is it easy to reproduce in your test environment?
Thanks,
ethanwu