On Sun, Dec 01, 2019 at 09:52:13PM +0000, Fedja Beader wrote: > I had a broken hard-disk from which ddrescue recovered all but about > 1600MB of data. As a result, the copy of it had roughly 50000 > uncorrectable errors as reported after scrub. > > I have saved the dmesg log recorded during this scrub, parsed logical > numbers out of it and finaly used "btrfs inspect-internal > logical-resolve" to obtain a list of files. > > However, after manually removing or restoring those files, the > subsequent run of "btrfs scrub" still produced >45000 uncorrectable > errors. Indeed, the reported files that were again obtained with the > above method, are damaged (input/output error on cat > /dev/null). > > It was suggested that rate-limiting could be the cause of this. I then > recompiled the kernel with the (the, as in 4.9.24 there is only one > occurance of it in btrfs_printk) "if (__ratelimit..." conditional > commented out, rebooted and disabled dmesg ratelimiting with sysctl > kernel.printk_ratelimit=0. Then again ran scrub. > > The result of this scrub was 41000 uncorrectable errors. However, > after manually repairing all the problems and re-running scrub, 39000 > uncorrectable errors still remain. > > Is there more rate-limiting going on? If so, how do I disable it? That's indeed caused by ratelimiting. There are __ratelimit calls specific to the scrub error messages (called in scrub_handle_errored_block, scrub_print_warning). You can remove the ratelimiting and get the flood of the messages for processing. The dmesg messages are more or less supposed to point out to a handful of problems like a few damaged blocks, for 40k messages it would be really a lot. The ratelimiting can happen also internally when printk decides that it throws away the messages (though I know it's trying not to).
