On 2020/1/2 下午8:07, Graham Cobb wrote: > On 02/01/2020 01:26, Qu Wenruo wrote: >> >> >> On 2020/1/2 上午7:35, Graham Cobb wrote: >>> I have a problem on one BTRFS filesystem. It is not a critical >>> filesystem (it is used for backups) and I have not yet tried even >>> unmounting and remounting, let alone a "btrfs check". >>> >>> The problem seems to be that after several iterations of running 'btrfs >>> scrub' for 30 minutes, then pausing for a while, then resuming the >>> scrub, I got a transaction aborted with an EFBIG error and a warning in >>> the kernel log. The fs went readonly, and transid verify errors are now >>> reported. The original log extract is available at >>> http://www.cobb.uk.net/kern.log.bug-010120 but I have pasted the key >>> part below. >> >> EFBIG in btrfs is very rare, and can only be caused by too many system >> chunks. >> >> The most common reason is the chunk pre-alllocation for scrub, which >> also matches your situation. >> >> There is already a fix for it, and will land in v5.5 kernel. >> It looks like we should backport it. > > Thanks Qu. I will wait for that kernel, and maybe stop my monthly scrubs > (although my several other btrfs filesystems did not have a problem this > month fortunately). And the problem will normally not impact the fs, as newly created empty system chunks will be soon cleaned up. > > I am getting transid errors: This is not a good news. And in fact it's normally a deadly problem. > >>> Jan 1 06:51:56 black kernel: [1931271.801468] BTRFS error (device >>> sdc3): parent transid verify failed on 16216583520256 wanted 301800 >>> found 301756 > > I presume 301800 is the transaction which failed and caused the fs to go > readonly. I don't suppose it is likely I could revert the whole fs to > the state of the last successful transaction is there? This means some tree blocks doesn't reach disk. It can be deadly, or just a side effect caused by the transaction abort. > > It is not a big problem: the fs only contains backup snapshots (not my > only backups!) although it would be nice to recover the historical > snapshots if I could (I used them to research a bug I reported to debian > just the other day!). I'm afraid this depends on where the corruption is. If it's just caused by that EFBIG error, and btrfs check reports no error, then it's just temporary problem caused by transaction abort. If it's in extent tree, it only affects mount or certain write operations, but if you can mount the fs, it should be OK to read out the whole fs. If it's in csum tree, it will affect certain data read, other than mostly OK. If it's in subvolume trees, some directories/files can't be accessed. So, please run a btrfs check on the unmounted fs to verify what's the case. Thanks, Qu > > Regards > Graham >
Attachment:
signature.asc
Description: OpenPGP digital signature
