Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Wenruo (and all),

> Any log on `btrfs check` without --repair?

This was all after I reformatted the partition, so it might not be as
useful. But as you see, `dmesg` reports 14 corruption errors on
/dev/sda1 (which has been functioning correctly) but `btrfs scrub` does
not report any problems. I'll do a btrfs check when I boot from a live
USB.

> But normally, csum read shouldn't lead to RO, thus I believe there
> are more problems of that previous failure.

I think there are other problems indeed, not just csum mismatch. I got
lots of I/O errors, but now after reformatting my partition they just
disappeared. Particularly, writing to the filesystem could randomly
crash the filesystem. It could be a hardware issue, but now it seems
more likely to be software-related.

Best,
Xuanrui

On Tue, 2020-06-02 at 09:18 +0800, Qu Wenruo wrote:
> 
> On 2020/6/2 上午5:08, Xuanrui Qi wrote:
> > Hello all,
> > 
> > I have just recovered from a massive filesystem corruption problem
> > which turned out to be a total nightmare, and I have strong reason
> > to
> > suspect that it is related to eCryptfs-encrypted folders on btrfs.
> > 
> > I run Arch Linux and have my /home directory as a btrfs partition.
> > My
> > user's home directory (/home/xuanrui) is encrypted using eCryptFS.
> > 
> > I ran into a massive filesystem corrpution issue a while ago. When
> > reading certain files or occasionally writing to files, I encounter
> > FS
> > errors (mainly checksum errors, but also other I/O errors). Then my
> > file system becomes read-only because errors were encountered.
> 
> It's a pity we won't get the dmesg of that incident, what would be
> super
> useful to debug.
> 
> > A `btrfs scrub` identified a dozen of checksum errors which were
> > "not
> > correctable", and `btrfs check --repair` (and `btrfs check --repair 
> > --
> > init-csum-tree`)
> 
> Not recommended, but the output may still help.
> 
> > also failed to fix anything. The former crashed in a
> > segfault, and the latter refused to write anything because of an
> > "I/O
> > error".
> > 
> > Unfortunately, I don't have any logs because I had to nuke (wipe &
> > re-
> > make) my filesystem as the solution. However, after the
> > reformatting I
> > gave up using eCryptFs, and the file corruption bugs have not
> > reappeared since.
> 
> That's a little strange. I guess there is some buffered IO mixed with
> direct IO, which is known to cause csum mismatch, while other fs just
> can't detect such data corruption and pretend nothing happened.
> 
> But normally, csum read shouldn't lead to RO, thus I believe there
> are
> more problems of that previous failure.
> 
> > Initially I suspected that it was a hardware issue,
> > but I did a SMART test and no errors were detected; I strongly
> > suspect
> > that it is related to eCryptFS.
> > 
> > System info:
> > 
> > uname -a:
> > 
> > Linux xuanruiwork 5.6.15-3-clear #1 SMP Sun, 31 May 2020 19:57:42
> > +0000
> > x86_64 GNU/Linux
> > 
> > btrfs --version:
> > btrfs-progs v5.6.1
> > 
> > (the rest is from after the reformat, but the setup is identical to
> > before the reformat sans eCryptFS)
> > 
> > btrfs fi show:
> > Label: none  uuid: 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64
> > 	Total devices 1 FS bytes used 57.58GiB
> > 	devid    1 size 332.94GiB used 60.02GiB path /dev/sda3
> > 
> > btrfs fi df /home:
> > Data, single: total=59.01GiB, used=57.26GiB
> > System, single: total=4.00MiB, used=16.00KiB
> > Metadata, single: total=1.01GiB, used=328.25MiB
> > GlobalReserve, single: total=75.17MiB, used=0.00B
> > 
> > Some output from dmesg (note that /dev/sda1 is not the corrupted
> > filesystem; these corruptions seem to have been self-corrected by
> > btrfs):
> > 
> > [    3.434351] BTRFS: device fsid 823961e1-6b9e-4ab8-b5a7-
> > c17eb8c40d64
> > devid 1 transid 79 /dev/sda3 scanned by systemd-udevd (519)
> > [    3.440896] BTRFS: device fsid a3892669-1ad8-4ff3-9747-
> > 0f8c405c0e6a
> > devid 1 transid 4769881 /dev/sda1 scanned by systemd-udevd (487)
> > [    3.461539] BTRFS info (device sda1): disk space caching is
> > enabled
> > [    3.461540] BTRFS info (device sda1): has skinny extents
> > [    3.464079] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0,
> > rd
> > 0, flush 0, corrupt 14, gen 0
> 
> Corruption count 14 doesn't seem good.
> 
> > [    3.510991] BTRFS info (device sda1): enabling ssd optimizations
> > [    5.938153] BTRFS info (device sda1): disk space caching is
> > enabled
> > [    7.072974] BTRFS info (device sda3): enabling ssd optimizations
> > [    7.072977] BTRFS info (device sda3): disk space caching is
> > enabled
> > [    7.072978] BTRFS info (device sda3): has skinny extents
> > [ 3710.968433] BTRFS warning (device sda3): qgroup rescan init
> > failed,
> > qgroup is not enabled
> 
> And btrfs is trying to init qgroup rescan while qgroup is not
> enabled?
> That's doesn't sound good either.
> 
> > [ 7412.459332] BTRFS info (device sda1): scrub: started on devid 1
> > [ 7545.641724] BTRFS info (device sda1): scrub: finished on devid 1
> > with status: 0
> > [ 8244.846830] BTRFS info (device sda3): scrub: started on devid 1
> > [ 8369.651774] BTRFS info (device sda3): scrub: finished on devid 1
> > with status: 0
> 
> Any log on `btrfs check` without --repair?
> 
> Thanks,
> Qu
> > If anyone could look into the issue, it would be greatly
> > appreciated.
> > 
> > Best,
> > Xuanrui
> > 

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux