Am Sat, 2 Apr 2016 18:14:17 -0600 schrieb Chris Murphy <lists@xxxxxxxxxxxxxxxxx>: > On Sat, Apr 2, 2016 at 2:16 PM, Kai Krakow <hurikhan77@xxxxxxxxx> > wrote: > > > I'll go checking the RAM for problems - tho that would be the first > > time in twenty years that a RAM module hadn't errors from the > > beginning. Well, you'll never know. But I expect no error since > > usually this would mean all sorts of different and random problems > > which I don't have. Problems are very specific, which is atypical > > for RAM errors. > > Well so far it's just the VDI that's experiencing csum mismatch > errors, right? So that's not bad RAM, which would affect other files > too. And same for a failing SSD. No, other files are affected, too. And it looks like those files are easily affected even when removed and recreated from whatever backup source. > I think you've got a bug somewhere and it's just hard to say where it > is based on the available information. I've already lost track if > others have all of the exact same setup you do: bcache + nossd + > autodefrag + lzo + VirtualBox writing to VDI on this Btrfs volume. > There are others who have some of those options, but I don't know if > there's anyone who has all of those going on. I didn't run VirtualBox since the incident. So I'd rule out VirtualBox. Currently, there seems to be no csum error for the VDI file, instead now another file gets corruptions, even after recreated. I think it is result of another corruption and thus a side effect. Also I think, having options nossd+autodefrag+lzo shouldn't be an exotic or unsupported option. Having this on top of bcache should just work. Let's not rule out bcache had a problem although I usually expect bcache to freak out with internal btree corruption then. > Maybe Qu has some suggestions, but if it were me I'd do this. Build > mainline 4.5.0, it's a known quantity by Btrfs devs. 4.5.0-gentoo is currently only a few patches so I could easily build vanilla. > Build the kernel > with BTRFS_FS_CHECK_INTEGRITY enabled in kernel config. And when you > mount the file system, don't use mount option check_int, just use your > regular mount options and try to reproduce the VDI corruption. If you > can reproduce it, then start over, this time with check_int mount > option included along with the others you're using and try to > reproduce. It's possible there will be fairly verbose kernel messages, > so use boot parameter log_buf_len=1M and then that way you can use > dmesg rather than depending on journalctl -k which sometimes drops > messages if there are too many. Does it make sense while I still have the corruptions in the FS? I'd like to wait for Qu whether I should recreate the FS or whether I should take some image, or send info to improve btrfsck... I'm pretty sure I do not have reproducible corruptions which are not caused by another corruption - so check_int would probably be of less use currently. > If you reproduce the corruption while check_int is enabled, kernel > messages should have clues and then you can put that in a file and > attach to the list or open a bug. FWIW, I'm pretty sure your MUA is > wrapping poorly, when I look at this URL for your post with smartctl > output, it wraps in a way that's essentially impossible to sort out at > a glance. Whether it's your MUA or my web browser pretty much doesn't > matter, it's not legible so what I do is just attach as file to a bug > report or if small enough onto the list itself. > http://www.spinics.net/lists/linux-btrfs/msg53790.html Claws mail is just too smart for me... It showed up correctly in the editor before hitting the send button. I wish I could go back to knode (that did it's job right). But it's currently an unsupported orphan project of KDE. :-( > Finally, I would retest yet again with check_int_data as a mount > option and try to reproduce. This is reported to be dirt slow, but it > might capture something that check_int doesn't. But I admit this is > throwing spaghetti on the wall, and is something of a goose chase just > because I don't know what else to recommend other than iterating all > of your mount options from none, adding just one at a time, and trying > to reproduce. That somehow sounds more tedious. But chances are you'd > find out what mount option is causing it; OR maybe you'd find out the > corruption always happens, even with defaults, even without bcache, in > which case that'd seem to implicate either a gentoo patch, or a > virtual box bug of some sort. I think the latter two are easily the least probable sort of bugs. But I'll give it a try. For the time being, I could switch bcache to write-around mode - so it could at least not corrupt btrfs during writes. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
