Re: btrfsck: backpointer mismatch (and multiple other errors)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Kai Krakow posted on Mon, 04 Apr 2016 00:19:25 +0200 as excerpted:

> The corruptions seem to be different by the following observation:
> 
> While the VDI file was corrupted over and over again with a csum error,
> I could simply remove it and restore from backup. The last thing I did
> was ddescue it from the damaged version to my backup device, than rsync
> the file back to the originating device (which created a new file
> side-by-side, so in a new area of disk space, then replace-by-renamed
> the old one). I didn't run VirtualBox since back then but the file
> didn't become corrupted either since then.
> 
> But now, according to btrfsck, a csum error instead came up in another
> big file from Steam. This time, when I rm the file, the kernel
> backtraces and sends btrfs to RO mode. The file cannot be removed. I'm
> going to leave it that way currently, the file won't be used currently.
> And I can simply ignore it for backup and restore, it's not an important
> one. Better have an "incorrectable" csum error there than having one
> jumping unpredictably across my files.

While my dying ssd experience was with btrfs raid1 direct on a pair of 
ssds, extrapolating from what I learned about the ssd behavior to your 
case with bcache caching to the ssd, then writing back to the spinning 
rust backing store, presumably in btrfs single-device mode with single 
data and either single or dup metadata (there's enough other cases 
interwoven on this thread its no longer clear to me which posted btrfs fi 
show, etc, apply to this case, so I'm guessing, as I believe presenting 
it as more than a single device at the btrfs level would require multiple 
bcache devices, tho of course you could do that by partitioning the 
ssd)...

Would lead me to predict very much the behavior you're seeing, if the 
caching ssd was dying.

As bcache is running below btrfs, btrfs won't know anything about it, and 
therefore, will behave, effectively, as if it's not there -- an error on 
the ssd will look like an error on the btrfs, period.  (As I'm assuming a 
single btrfs device, which device of the btrfs doesn't come into 
question, tho which copy of dup metadata might... but that's an entirely 
different can of worms since I'm not sure whether the bcache would end up 
deduping the dup metadata or not, and the ssd might do the same, and...)

And with bcache doing write-behind from the ssd to the backing store, 
underneath the level at which btrfs could detect and track csum 
corruption, if it's corrupt on the ssd, that corruption then transfers to 
the backing store as btrfs won't know that transfer is happening at all 
and thus won't be in the loop to detect the csum error at that stage.


Meanwhile, what I saw on the pair of ssds, one going bad, in btrfs raid1 
mode, was that a btrfs scrub *WOULD* successfully detect the csum errors 
on the bad ssd, and rewrite it from the remaining good copy.

Keep in mind that this is without snapshots, so that rewrite, while COW, 
would then release the old copy back into the free space pool.  In so 
doing, it would trigger the ssd firmware to copy the rest of the erase-
block and erase it, and that in turn would trigger the firmware to detect 
the bad sector and replace it with one from its spare-sectors list.  As a 
result, it would tick up the raw value of attribute #5, 
Reallocated_Sector_Ct, as well as 182, Erase_Fail_Count_Total, in smartctl 
-A (tho the two attributes didn't increase in numeric lock-step, both 
were increasing over time, primarily when I ran scrubs).


But it was mostly (almost entirely) when I ran the scrubs and 
consequently rewrote the corrupted sectors from the copy on the good 
device, that it would trigger those erase-fails and sector reallocations.

Anyway, the failing ssd's issues gradually got worse, until I was having 
to scrub and trigger both filesystem recopy and bad ssd sector rewrites 
any time I wrote anything major to the filesystem as well as at cold-boot 
(leaving the system off for several hours apparently accelerated the 
sector rot within stable data, while the powered-on state kept the flash 
cells charged high enough they didn't rot so fast and it was mostly or 
entirely new/changed data I had to worry about).  Eventually I simply 
decided I was tired of the now more or less constant hassle and I wasn't 
learning much new any more from the decaying device's behavior, and I 
replaced it.


Translating that to your case, if your caching ssd is dying and some 
sectors are now corrupted, unless there's a second btrfs copy of that 
block to copy over the bad version with, it's unlikely to trigger those 
sector reallocations.

Tho actually rewriting them (or at the device firmware level, COWing them 
and erasing the old erase-blocks), as bcache will be doing if it dumps 
the current cache content and fills those blocks with something else, 
should trigger the same thing, tho unless bcache can force-dump and 
recache or something, I don't believe there's a systematic way to trigger 
it over all cached data as btrfs scrub does.

Anyway, if I'm correct and as your ordering the new ssd indicates you may 
suspect as well, the problem may indeed be that ssd, and a new ssd 
(assuming /it/ isn't defective) should fix it, tho the existing damage on 
the existing btrfs may or may not be fully recoverable once you get a new 
ssd and thus don't have to worry about further damage from the old one.

Meanwhile, putting bcache into write-around mode, so it makes no further 
changes to the ssd and only uses it for reads, is probably wise, and 
should help limit further damage.  Tho if in that mode bcache still does 
writeback of existing dirty and cached data to the backing store, some 
further damage could occur from that.  But I don't know enough about 
bcache to know what its behavior and level of available configuration in 
that regard actually are.  As long as it's not trying to write anything 
from the ssd to the backing store, I think further damage should be very 
limited.

But were you running btrfs raid1 without bcache, or with multiple devices 
at the btrfs level, each bcached but to separate ssds so any rot wouldn't 
be likely to transfer between them increasing the chances of both copies 
being bad at once, I expect you'd be seeing behavior on your ssd very 
close to what I saw on my failing one, and assuming your other device was 
fine, you could still be scrubbing and recovering fine, as I was, tho 
with the necessary frequency of scrubs increasing over time (and not 
helped by the recently reported too many csum errors on compressed 
content, even when they're on raid1 and should recover from the other 
copy, crashing btrfs and the system, thus requiring more frequent scrubs 
than would otherwise be required -- I ran into this too, but didn't 
realize it only triggered on compressed content and was thus a specific 
bug, and simply attributed it to btrfs not yet being fully stable and 
believed that's what it always did with too many crc errors, even when 
they should be recoverable from the good raid1 copy).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux