On Sun, Sep 1, 2019 at 11:39 AM Rann Bar-On <rann@xxxxxxxxxxxxx> wrote: > > On Sun, 2019-09-01 at 07:48 +0800, Qu Wenruo wrote: > > > > On 2019/9/1 上午7:39, Rann Bar-On wrote: > > > On Sat, 2019-08-31 at 17:04 -0600, Chris Murphy wrote: > > > > On Sat, Aug 31, 2019 at 2:26 PM Rann Bar-On <rann@xxxxxxxxxxxxx> > > > > wrote: > > > > > I just downgraded to kernel 4.19, and the supposed corruption > > > > > vanished. > > > > > This may be related to > > > > > > > > > > https://www.spinics.net/lists/linux-btrfs/msg91046.html > > > > > > > > > > If I can provide information that would help fix this issue, > > > > > I'd be > > > > > glad to, but I cannot upgrade back to kernel 5.2, as I can't > > > > > risk > > > > > this > > > > > system. > > > > > > > > 5.2 has more strict checks for corruption, exposing the rare case > > > > where metadata in a leaf is corrupt but the checksum was properly > > > > computed. > > > > Exactly. > > > > Although for your case, it's some older kernel doing something bad. > > > > It's also reported once for the same problem, some older kernel > > doesn't > > set the mode member properly. > > > > > > > Btrfs v3.17 > > > > > > > > This is old. I suggest finding a newer version of btrfs-progs, > > > > ideally > > > > latest stable version is 5.2.1. Definitely don't use --repair > > > > with > > > > this version. It's safe to use check --readonly (which is the > > > > default) > > > > with this version but probably not that helpful to devs. > > > > > > > > > > Not really sure why that said 3.17: > > > > > > $ btrfs --version > > > btrfs-progs v5.2.1 > > > > > > Anyway, running btrfs --repair on the file system didn't do > > > anything to > > > fix the above errors. > > > > That's what we need to enhance next. > > > > > I removed at least one of the corrupt files (the only one that was > > > mode > > > 0) using kernel 4.19. > > > > > > Happy to help further if I can. What would you suggest as far as > > > fixing > > > this or reporting it usefully? If you believe 5.2 isn't causing the > > > corruption, but rather, just exposing it, I'm more than happy to > > > experiment with it. > > > > Deleting the offending inodes would be enough to fix the alert. > > > > I deleted the file using the older kernel. I rebooted into the new > kernel, and things seem good for now. > > Note: The newer one wouldn't let me access the file to delete it, nor > did any btrfs repair tool do anything at all. This is a big problem > IMO! The current behavior is an improvement over propagating corruption and never detecting it because the leaf is assumed to be correct only because the checksum matches. The next step is figuring out ways to work around such rare detected corruptions, hopefully automatically and while online. I don't consider it user responsibility to have to do this, but I'm vaguely curious if it's possible to delete the offending file in a snapshot, then delete the original subvolume. i.e. 1. snapshot the subvolume containing the file (default rw snapshot) 2. delete the bad file(s) in the snapshot 3. delete the original subvolume (snapshot's parent) I'm curious if either 2 or 3 are permitted. -- Chris Murphy
