Re: [PATCH] Btrfs: relocate csums properly with prealloc extents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



OK. btrfs scrub and dmesg is hitting me with lots of unfixable errors.
All in the same file. Example

[13313.441091] btrfs: unable to fixup (regular) error at logical
560107954176 on dev /dev/sdn
[13321.532223] scrub_handle_errored_block: 1510 callbacks suppressed
[13321.532309] btrfs_dev_stat_print_on_error: 1510 callbacks suppressed
[13321.532314] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt
40016, gen 0
[13321.532420] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt
40017, gen 0
[13321.532545] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt
40018, gen 0
[13321.532605] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt
40019, gen 0
[13321.533039] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt
40020, gen 0
[13321.537519] scrub_handle_errored_block: 1508 callbacks suppressed
[13321.537525] btrfs: unable to fixup (regular) error at logical
560630136832 on dev /dev/sdq
[13321.537821] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt
40021, gen 0
[13321.538081] btrfs: unable to fixup (regular) error at logical
560630140928 on dev /dev/sdq
[13321.538438] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt
40022, gen 0
[13321.538715] btrfs: unable to fixup (regular) error at logical
560630145024 on dev /dev/sdq
[13321.539016] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt
40023, gen 0
[13321.539234] btrfs: unable to fixup (regular) error at logical
560630149120 on dev /dev/sdq
[13321.539522] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt
40024, gen 0
[13321.539739] btrfs: unable to fixup (regular) error at logical
560630153216 on dev /dev/sdq
[13321.540027] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt
40025, gen 0
[13321.540242] btrfs: unable to fixup (regular) error at logical
560630157312 on dev /dev/sdq
[13321.540620] btrfs: unable to fixup (regular) error at logical
560630161408 on dev /dev/sdq
[13321.541140] btrfs: unable to fixup (regular) error at logical
560630165504 on dev /dev/sdq
[13321.541571] btrfs: unable to fixup (regular) error at logical
560630169600 on dev /dev/sdq
[13321.541931] btrfs: unable to fixup (regular) error at logical
560630173696 on dev /dev/sdq

Luckily all the corruption seems to be in a single very large file,
but on different part of it on different disks. The file was written
by rtorrent which have the option "system.file_allocate.set = yes"
configured.
I also have samba configured with "strict allocate = yes" because it
is recommended for best performance on extent based filesystems. Do
that mean even samba files vulnerable to this corruption too?
If so this could become very ugly very fast on certain systems.

Mvh

Hans-Kristian Bakke


On 23 October 2013 23:24, Hans-Kristian Bakke <hkbakke@xxxxxxxxx> wrote:
> I was hit by this when trying to rebalance a 16TB RAID10 to 32TB
> RAID10 going from 4 to 8 WD SE 4TB drives today. I cannot finish a
> rebalance because of failed csum.
>
> [10228.850910] BTRFS info (device sdq): csum failed ino 487 off 65536
> csum 2566472073 private 151366068
> [10228.850967] BTRFS info (device sdq): csum failed ino 487 off 69632
> csum 2566472073 private 3056924305
> [10228.850973] BTRFS info (device sdq): csum failed ino 487 off 593920
> csum 2566472073 private 906093395
> [10228.851004] BTRFS info (device sdq): csum failed ino 487 off 73728
> csum 2566472073 private 2680502892
> [10228.851014] BTRFS info (device sdq): csum failed ino 487 off 598016
> csum 2566472073 private 1940162924
> [10228.851029] BTRFS info (device sdq): csum failed ino 487 off 77824
> csum 2566472073 private 2939385278
> [10228.851051] BTRFS info (device sdq): csum failed ino 487 off 602112
> csum 2566472073 private 645310077
> [10228.851055] BTRFS info (device sdq): csum failed ino 487 off 81920
> csum 2566472073 private 3600741549
> [10228.851078] BTRFS info (device sdq): csum failed ino 487 off 86016
> csum 2566472073 private 200201951
> [10228.851091] BTRFS info (device sdq): csum failed ino 487 off 606208
> csum 2566472073 private 1002916440
>
> The system is running a scrub now and I will return with some more
> details later. I do not think systemd is logging to this volume, but
> the scrub wil probably show which files are affected.
>
> As this is a very serious issue for those hit by the corruption (it
> basically makes it impossible to run rebalance with all its
> consequences) hopefully this wil go upstream soon.
> I am on Kernel 3.11.6 by the way.
> Mvh
>
> Hans-Kristian Bakke
> Mob: 91 76 17 38
>
>
> On 4 October 2013 23:19, Johannes Hirte <johannes.hirte@xxxxxxxxxxxxx> wrote:
>> On Fri, 27 Sep 2013 09:37:00 -0400
>> Josef Bacik <jbacik@xxxxxxxxxxxx> wrote:
>>
>>> A user reported a problem where they were getting csum errors when
>>> running a balance and running systemd's journal.  This is because
>>> systemd is awesome and fallocate()'s its log space and writes into
>>> it.  Unfortunately we assume that when we read in all the csums for
>>> an extent that they are sequential starting at the bytenr we care
>>> about.  This obviously isn't the case for prealloc extents, where we
>>> could have written to the middle of the prealloc extent only, which
>>> means the csum would be for the bytenr in the middle of our range and
>>> not the front of our range.  Fix this by offsetting the new bytenr we
>>> are logging to based on the original bytenr the csum was for.  With
>>> this patch I no longer see the csum errors I was seeing.  Thanks,
>>
>> Any assessment when this goes upstream? Until it hit Linus tree it
>> won't won't appear in stable. And this seems rather important.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux