Re: init-scum-tree Assertion `ret` failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 15, 2015 at 7:07 PM, Tim DeNike <tim@xxxxxxxxx> wrote:
> LSI hardware raid6.

Is metadata profile DUP or single?

 I could mount the volume and read/write to it but
> larger files (10+GB) would often Immediately have u correctable CRC
> errors if I did a scrub right after they were created.

Is this a new(ish) btrfs file system? Any crashes? I don't recall off
hand cases of individual file csums going badly this quickly with
otherwise functioning filesystem.

>I don't think
> it's a problem with hardware as I've went through every Disk in the
> array, sas cables, controller card (could be ram/cpu/motherboard I
> guess).

Run memtest86+ as long as possible. Days if you can. At least
overnight tonight into tomorrow, and if it has to wait until the next
weekend, start Friday let it go until Monday. It's the only way to be
sure.


> I figured something must be wrong with the crc tree and maybe
> recreating all crcs would resolve the issue.  After I initially ran
> init-csum I could still mount but all crcs were invalid (it wiped but
> never recreated them).

Generic corruption via memory, cable, drive, RAID, has no idea if it's
corrupting specifically the csum tree. It'd corrupt all sorts of
things. You'd have corrupt metadata, and the scrub and normal usage
should indicate occasionally that one copy is corrupt and that it had
to use the good copy and fix the bad one. Such an entry looks like
this:
[48466.824770] BTRFS: checksum error at logical 20971520 on dev
/dev/sdb, sector 57344: metadata leaf (level 0) in tree 3
[48466.829900] BTRFS: checksum error at logical 20971520 on dev
/dev/sdb, sector 57344: metadata leaf (level 0) in tree 3
[48466.834944] BTRFS: bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[48466.853589] BTRFS: fixed up error at logical 20971520 on dev /dev/sdb

If you don't have anything like this, then something else is going on
that's rather specific.

Hopefully someone else has a more efficient idea at tracking this
down, but I'd suggest just creating a new btrfs filesystem on this
raid6 and seeing if the problem reproduces. If it does
quickly/easily.... well then I'd say you need a 2nd device. The more
separate it is the better: SSD, HDD, iSCSI, DRDB, and create new raid1
btrfs. So raid6 as one device, and something else as the other. Now
reproduce the corruption, and the scrubs/csum errors will tell you
which device has the problem. If both devices are fairly equally
corrupt, it has a different cause than if just the raid6 is corrupt.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux