On Sat, Apr 2, 2016 at 11:00 AM, Kai Krakow <hurikhan77@xxxxxxxxx> wrote: > Am Fri, 1 Apr 2016 01:27:21 +0200 > schrieb Henk Slager <eye1tm@xxxxxxxxx>: > >> It is not clear to me what 'Gentoo patch-set r1' is and does. So just >> boot a vanilla v4.5 kernel from kernel.org and see if you get csum >> errors in dmesg. > > It is the gentoo patchset, I don't think anything there relates to > btrfs: > https://dev.gentoo.org/~mpagano/genpatches/trunk/4.5/ > >> Also, where does 'duplicate object' come from? dmesg ? then please >> post its surroundings, straight from dmesg. > > It was in dmesg. I already posted it in the other thread and Qu took > note of it. Apparently, I didn't manage to capture anything else than: > > btrfs_run_delayed_refs:2927: errno=-17 Object already exists > > It hit me unexpected. This was the first time btrfs went RO for me. It > was with kernel 4.4.5 I think. > > I suspect this is the outcome of unnoticed corruptions that sneaked in > earlier over some period of time. The system had no problems until this > incident, and only then I discovered the huge pile of corruptions when I > ran btrfsck. > > I'm also pretty convinced now that VirtualBox itself is not the problem > but only victim of these corruptions, that's why it primarily shows up > in the VDI file. > > However, I now found csum errors in unrelated files (see other post in > this thread), even for files not touched in a long time. Ok, this is some good further status and background. That there are more csum errors elsewhere is quite worrying I would say. You said HW is tested, are you sure there no rare undetected failures, like due to overclocking or just aging or whatever. It might just be that spurious HW errors just now start to happen and are unrelated to kernel upgrade from 4.4.x to 4.5. I had once a RAM module going bad; Windows7 ran fine (at least no crashes), but when I booted with Linux/btrfs, all kinds of strange btrfs errors started to appear including csum errors. The other thing you could think about is the SSD cache partition. I don't remember if blocks from RAM to SSD get an extra CRC attached (independent of BTRFS). But if data gets corrupted while in the SSD, you could get very nasty errors, how nasty depends a bit on the various bcache settings. It is not unthinkable that dirty changed data gets written to the harddisks. But at least btrfs (scub) can detect that (the situation you are in now). Maybe to further isolate just btrfs, you could temporary rule out bcache by making sure the cache is clean and then increase the startsectors of second partitions on the harddisks by 16 (8KiB) and then reboot. Of course after any write to the partitions, you'll have to recreate all bcache. But maybe it is just due to bugs in older kernels that the fs has been silently corrupted and now kernel 4.5 cannot handle it anymore and any use of the fs increases corruption. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
