Re: btrfs balance to add new drive taking ~60 hours, no progress?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/3/2 上午4:32, Rich Rauenzahn wrote:
> (Is this just taking really long because I didn't provide filters when
> balancing across the new drive?)
> 
> Also, I DID just change my /etc/fstab to not resume the balance just
> in case I reboot:
> 
> /.BACKUPS               btrfs   compress=lzo,subvol=.BACKUPS,skip_balance   1 2
> 
> Kernel version:
> 
> Kernel:  5.5.5-1.el7.elrepo.x86_64
> 
> The pool is mirrored, 2 copies.
> 
> The last drive in the list is the one I added.  I think it's been at
> 8MiB the whole time.
> 
> $ sudo btrfs fi show /.BACKUPS/
> Label: 'BACKUPS'  uuid: cfd65dcd-2a63-4fb1-89a7-0bb9ebe66ddf
>         Total devices 4 FS bytes used 3.64TiB
>         devid    2 size 1.82TiB used 1.82TiB path /dev/sda1
>         devid    3 size 1.82TiB used 1.82TiB path /dev/sdc1
>         devid    4 size 3.64TiB used 3.64TiB path /dev/sdb1
>         devid    5 size 3.64TiB used 8.31MiB path /dev/sdj1
> 
> $ sudo btrfs fi df /.BACKUPS/
> Data, RAID1: total=3.63TiB, used=3.63TiB
> System, RAID1: total=32.00MiB, used=736.00KiB
> Metadata, RAID1: total=5.00GiB, used=3.88GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> $ btrfs fi usage /.BACKUPS/
> WARNING: cannot read detailed chunk info, RAID5/6 numbers will be
> incorrect, run as root
> Overall:
>     Device size:                  10.92TiB
>     Device allocated:              7.28TiB
>     Device unallocated:            3.64TiB
>     Device missing:               10.92TiB
>     Used:                          7.27TiB
>     Free (estimated):              1.82TiB      (min: 1.82TiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
> 
> $ sudo btrfs fi usage /.BACKUPS/
> Overall:
>     Device size:                  10.92TiB
>     Device allocated:              7.28TiB
>     Device unallocated:            3.64TiB
>     Device missing:                  0.00B
>     Used:                          7.27TiB
>     Free (estimated):              1.82TiB      (min: 1.82TiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
> 
> Data,RAID1: Size:3.63TiB, Used:3.63TiB
>    /dev/sda1       1.82TiB
>    /dev/sdb1       3.63TiB
>    /dev/sdc1       1.82TiB
>    /dev/sdj1       8.31MiB
> 
> Metadata,RAID1: Size:5.00GiB, Used:3.88GiB
>    /dev/sda1       3.00GiB
>    /dev/sdb1       5.00GiB
>    /dev/sdc1       2.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:736.00KiB
>    /dev/sda1      32.00MiB
>    /dev/sdb1      32.00MiB
> 
> Unallocated:
>    /dev/sda1       1.00MiB
>    /dev/sdb1       1.00MiB
>    /dev/sdc1       1.00MiB
>    /dev/sdj1       3.64TiB
> 
> 
> Processes (I also tried a cancel, which is just hung as well)
> 
> 4 S root      3665     1  0  80   0 - 60315 -      06:45 ?
> 00:00:00 sudo btrfs balance cancel /.BACKUPS/
> 4 D root      3666  3665  0  80   0 -  3983 -      06:45 ?
> 00:00:00 btrfs balance cancel /.BACKUPS/
> 4 S root     14035     1  0  80   0 - 60315 -      Feb28 ?
> 00:00:00 sudo btrfs filesystem balance /.BACKUPS/
> 4 D root     14036 14035  2  80   0 -  3984 -      Feb28 ?
> 00:59:12 btrfs filesystem balance /.BACKUPS/
> 
> All four drives ARE blinking, and the process takes <10% CPU, but > 0%.
> 
> 2.6%:
> 
> 14036 root      20   0   15936    656    520 D   2.6  0.0  59:13.90
> btrfs filesystem balance /.BACKUPS/
> 
> df, while probably misleading with btrfs:
> 
> Filesystem      1K-blocks       Used  Available Use% Mounted on
> /dev/sda1      5860531080 3906340128        384 100% /.BACKUPS
> 
> dmesg has a lot of these, and you can see they are issued pretty quickly:
> 
> [773986.367090] BTRFS info (device sda1): found 472 extents
> [773986.583133] BTRFS info (device sda1): found 472 extents
> [773986.799169] BTRFS info (device sda1): found 472 extents

That's a runaway balance.

> 
> sar output of relevant drives (10 secs):
> 
> 10:26:23 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
> avgqu-sz     await     svctm     %util
> 10:26:26 AM       sdb     78.45      0.00   2312.37     29.48
> 0.48      6.64      0.58      4.52
> 10:26:26 AM       sda     78.80      0.00   2312.37     29.35
> 0.94     12.53      0.53      4.20
> 10:26:26 AM       sdc     36.40      0.00    220.49      6.06
> 0.25      7.24      0.85      3.11
> 10:26:26 AM       sdj     36.40      0.00    220.49      6.06
> 0.23      6.74      0.83      3.04
> 
> $ sudo btrfs balance status -v /.BACKUPS/
> Balance on '/.BACKUPS/' is running, cancel requested

And ironically, currently to hit a runaway balance, canceling is the
primary reason.

So to properly canceling the runaway balance, you need to apply the
latest quicker canceling patchset:
https://patchwork.kernel.org/project/linux-btrfs/list/?series=242357

Thanks,
Qu


> 0 out of about 3733 chunks balanced (29 considered), 100% left
> Dumping filters: flags 0x7, state 0x5, force is off
>   DATA (flags 0x0): balancing
>   METADATA (flags 0x0): balancing
>   SYSTEM (flags 0x0): balancing
> 
> Oh, and the drive does think it is out of space even though the drive
> has been added:
> 
> $ dd if=/dev/random of=random
> dd: writing to ‘random’: No space left on device
> 0+7 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.341074 s, 0.0 kB/s
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux