On 2018/8/17 上午11:47, Loren M. Lang wrote: > Hello, > > I am unable to mount my btrfs in read-write mode after enabling quotas > and running a full balance on it. My service is running Ubuntu 17.10 > with Linux kernel 4.13.0-17-generic and btrfs-progs 4.12. I am trying to > recover with a live CD of Ubuntu 18.04.1 running Linux 4.15.0-29-generic > with btrfs-progs 4.15.1. My system has two ~3 TB drives in the btrfs > array with RAID1 for data and metadata and two sub volumes, / and /home.> > I was attempting to track down where my free space was going when I > discovered apt-btrfs-snapshot which was creating a snapshot for every > package install I had done, some quite old. I had at least 20 snapshots > created when I found it and told apt-btrfs-snapshot to delete all of > them. Since snapshot deletion is delayed, btrfs may be still deleting all these snapshots. And too many snapshots will bring performance impact to quota. > Still not getting back the free space I was expecting, I found a > script called btrfs-size.sh which can produce a report, but requires > quotas to be enabled so I ran “sudo btrfs quota enable /“. After loosing > some patience trying to figure it out, I decided to try and just run a > full balance across everything, something like “sudo btrfs balance start > -d -m -v /“, Then you're pushing the performance impact to maximum. Balance with a lot of snapshots/reflinked file will make quota as slow as hell. > but I can’t remember the exact command. I then went to bed > only to find my server was completely hung the next day. I couldn’t get > the screen to wake or and other response so I was force to power cycle > it. However, on repeated attempts to get it to boot, it hangs at the > point that it tries to mount read-write and starts slowly consuming more > and more RAM until it the system starts to hang. Switching to my 18.04.1 > recovery disk, I find I can mount it read only and look around, but I > can’t disable quotas in read-only mode. If I run "mount -o remount,rw > /mnt” to enable read-write, mount hangs in the D state forever and I can > slowly see the RAM usage increasing. I added a massive 32 GB of swap > partitions, but eventually the system hangs due to out of memory. > > Lastly, I’ve tried unmounting it and running btrfs check on the drive. I > see errors such as the following: > > $ sudo btrfs check -p /dev/sda4 > ... > ref mismatch on [3994222952448 16384] extent item 0, found 1 > tree backref 3994222952448 parent 9688891392 root 9688891392 not found > in extent tree > backpointer mismatch on [3994222952448 16384] > owner ref check failed [3994222952448 16384] > ref mismatch on [3994223329280 16384] extent item 0, found 1 > tree backref 3994223329280 parent 9688891392 root 9688891392 not found > in extent tree > backpointer mismatch on [3994223329280 16384] > owner ref check failed [3994223329280 16384] > ref mismatch on [3994271203328 16384] extent item 0, found 1 > tree backref 3994271203328 parent 9688891392 root 9688891392 not found > in extent tree > backpointer mismatch on [3994271203328 16384] > owner ref check failed [3994271203328 16384] > ref mismatch on [3994276593664 16384] extent item 0, found 1 > tree backref 3994276593664 parent 9688891392 root 9688891392 not found > in extent tree > backpointer mismatch on [3994276593664 16384] > owner ref check failed [3994276593664 16384] > ref mismatch on [3994278756352 16384] extent item 0, found 1 > tree backref 3994278756352 parent 9688891392 root 9688891392 not found > in extent tree > backpointer mismatch on [3994278756352 16384] > owner ref check failed [3994278756352 16384] > > ERROR: errors found in extent allocation tree or chunk allocation > block group 3520760643584 has wrong amount of free space > failed to load free space cache for block group 3520760643584 > checking free space cache [O] > checking fs roots [.][o].][o] > checking csums > checking root refs > checking quota groups > ^C > ubuntu@ubuntu:~$ > > It hung at checking quota groups for 12 hours before I killed it. Considering how many snapshots you have, btrfs-progs won't be any quicker than kernel. > The > errors above are only a small snippet, but seem to keep repeating the > same basic thing. I have not tried a repair yet. > > What’s the next step? Disable quota first, of course. (Only enable it when number of snapshots is kept pretty low and don't try offline dedupe, and don't run balance until really needed) You can disable quota using this branch of btrfs-progs: https://github.com/adam900710/btrfs-progs/tree/quota_disable Or apply this patch on btrfs-progs 4.17.1: https://patchwork.kernel.org/patch/10563589/ Then compile btrfs-progs, use the following command to disable quota unmounted: # ./btrfs rescue disable-quota /dev/sda4 It should finish pretty quickly. Then re-try btrfs check, rw mount (to let balance continue), and btrfs check again after balance finished. The reported error could be a false alert related to running balance. Thanks, Qu >
Attachment:
signature.asc
Description: OpenPGP digital signature
