Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 07/15/2016 12:39 AM, Andrei Borzenkov wrote:
15.07.2016 00:20, Chris Mason пишет:


On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
Hi All,

I developed a new btrfs command "btrfs insp phy"[1] to further
investigate this bug [2]. Using "btrfs insp phy" I developed a script
to trigger the bug. The bug is not always triggered, but most of time
yes.

Basically the script create a raid5 filesystem (using three
loop-device on three file called disk[123].img); on this filesystem

Are those devices themselves on btrfs? Just to avoid any sort of
possible side effects?

it is create a file. Then using "btrfs insp phy", the physical
placement of the data on the device are computed.

First the script checks that the data are the right one (for data1,
data2 and parity), then it corrupt the data:

test1: the parity is corrupted, then scrub is ran. Then the (data1,
data2, parity) data on the disk are checked. This test goes fine all
the times

test2: data2 is corrupted, then scrub is ran. Then the (data1, data2,
parity) data on the disk are checked. This test fail most of the time:
the data on the disk is not correct; the parity is wrong. Scrub
sometime reports "WARNING: errors detected during scrubbing,
corrected" and sometime reports "ERROR: there are uncorrectable
errors". But this seems unrelated to the fact that the data is
corrupetd or not
test3: like test2, but data1 is corrupted. The result are the same as
above.


test4: data2 is corrupted, the the file is read. The system doesn't
return error (the data seems to be fine); but the data2 on the disk is
still corrupted.


Note: data1, data2, parity are the disk-element of the raid5 stripe-

Conclusion:

most of the time, it seems that btrfs-raid5 is not capable to rebuild
parity and data. Worse the message returned by scrub is incoherent by
the status on the disk. The tests didn't fail every time; this
complicate the diagnosis. However my script fails most of the time.

Interesting, thanks for taking the time to write this up.  Is the
failure specific to scrub?  Or is parity rebuild in general also failing
in this case?


How do you rebuild parity without scrub as long as all devices appear to
be present?

If one block is corrupted, the crcs will fail and the kernel will rebuild parity when you read the file. You can also use balance instead of scrub.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux