On 02/01/2020 12:34, Qu Wenruo wrote: > > > On 2020/1/2 下午8:07, Graham Cobb wrote: >> On 02/01/2020 01:26, Qu Wenruo wrote: >>> >>> >>> On 2020/1/2 上午7:35, Graham Cobb wrote: >>>> I have a problem on one BTRFS filesystem. It is not a critical >>>> filesystem (it is used for backups) and I have not yet tried even >>>> unmounting and remounting, let alone a "btrfs check". >>>> >>>> The problem seems to be that after several iterations of running 'btrfs >>>> scrub' for 30 minutes, then pausing for a while, then resuming the >>>> scrub, I got a transaction aborted with an EFBIG error and a warning in >>>> the kernel log. The fs went readonly, and transid verify errors are now >>>> reported. The original log extract is available at >>>> http://www.cobb.uk.net/kern.log.bug-010120 but I have pasted the key >>>> part below. >>> >>> EFBIG in btrfs is very rare, and can only be caused by too many system >>> chunks. >>> >>> The most common reason is the chunk pre-alllocation for scrub, which >>> also matches your situation. >>> >>> There is already a fix for it, and will land in v5.5 kernel. >>> It looks like we should backport it. >> >> Thanks Qu. I will wait for that kernel, and maybe stop my monthly scrubs >> (although my several other btrfs filesystems did not have a problem this >> month fortunately). > > And the problem will normally not impact the fs, as newly created empty > system chunks will be soon cleaned up. > >> >> I am getting transid errors: > > This is not a good news. And in fact it's normally a deadly problem. In fact, this was not a real problem: the errors were because the filesystem was still mounted from the original error and had gone ro so I guess the in-memory state was different from the on-disk state. Doh! A simple umount and mount worked fine, although I then did a btrfs check which also worked fine: black:~# btrfs check --readonly -p /dev/sdc3 Opening filesystem to check... Checking filesystem on /dev/sdc3 UUID: 4d1ba5af-8b89-4cb5-96c6-55d1f028a202 [1/7] checking root items (0:06:27 elapsed, 25179174 items checked) [2/7] checking extents (6:34:26 elapsed, 2419791 items checked) cache and super generation don't match, space cache will be invalidated [3/7] checking free space tree (0:00:00 elapsed) [4/7] checking fs roots (25:44:17 elapsed, 1497725 items checked) [5/7] checking csums (without verifying data) (0:54:36 elapsed, 4812627 items checked) [6/7] checking root refs (0:00:00 elapsed, 1067 items checked) [7/7] checking quota groups skipped (not enabled on this FS) found 11946687545430 bytes used, no error found total csum bytes: 11626743024 total tree bytes: 39628275712 total fs tree bytes: 24636817408 total extent tree bytes: 2363850752 btree space waste bytes: 5422658757 file data blocks allocated: 29159815589888 referenced 16100593688576 Thanks again for the help, and for the design which prevented fs corruption in this case. I would encourage you to consider backporting the fix for the original EFBIG problem, as you suggested above. Graham > >> >>>> Jan 1 06:51:56 black kernel: [1931271.801468] BTRFS error (device >>>> sdc3): parent transid verify failed on 16216583520256 wanted 301800 >>>> found 301756 >> >> I presume 301800 is the transaction which failed and caused the fs to go >> readonly. I don't suppose it is likely I could revert the whole fs to >> the state of the last successful transaction is there? > > This means some tree blocks doesn't reach disk. > It can be deadly, or just a side effect caused by the transaction abort. > >> >> It is not a big problem: the fs only contains backup snapshots (not my >> only backups!) although it would be nice to recover the historical >> snapshots if I could (I used them to research a bug I reported to debian >> just the other day!). > > I'm afraid this depends on where the corruption is. > > If it's just caused by that EFBIG error, and btrfs check reports no > error, then it's just temporary problem caused by transaction abort. > > > If it's in extent tree, it only affects mount or certain write > operations, but if you can mount the fs, it should be OK to read out the > whole fs. > > If it's in csum tree, it will affect certain data read, other than > mostly OK. > > If it's in subvolume trees, some directories/files can't be accessed. > > So, please run a btrfs check on the unmounted fs to verify what's the case. > > Thanks, > Qu > >> >> Regards >> Graham >> >
