Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Dec 28, 2019 at 06:04:21PM +0100, Leszek Dubiel wrote:
> 
> PROBLEM SOLVED --- btrfs was busy cleaing after snaphot deletion few days
> ago, so it dodn't have time to "dev delete", that's why it was slow

That checks out.  Snapshot delete and remove-device/resize/balance are
not able to run at the same time.  There is a mutex, so one or the
other will run while the other is blocked.

> =======================
> 
> 
> I restarted server, so job "btrfs delete" was not existent any more.
> But disks were still working (!!!). I wondered why? What is BTRFS doing all
> the time?
> 
> I realized that afew days before starting "btrfs dev delete" I have removed
> many snapshots -- there were about 400 snapshots and I left 20 only. I did
> that because I have read that many snapshot could slowdown btrfs operations
> severely.
> 
> 
> 
> I made an experiment on another testing serwer:
> 
> 1. started command "watch -n1 'btrfs fi df /"
> 2. started "iostat -x -m"
> 
> Disks were not working at all.
> 
> 
> 3. Then I removed many shapshots on that testing server
> 
> and I was watching:
> 
> Data, single: total=6.56TiB, used=5.13TiB
> System, RAID1: total=32.00MiB, used=992.00KiB
> Metadata, RAID1: total=92.00GiB, used=70.56GiB
> GlobalReserve, single: total=512.00MiB, used=1.39MiB
> 
> Disks started to work hard. So btrfs was probably cleaining after snapshot
> deletion.
> 
> At the beginning in "Metadata" line there was "used=70.00GiB".
> 
>            Metadata, RAID1: total=92.00GiB, used=70.00GiB
> 
> It was changing all the time... getting lower and lower. During that process
> in line
> 
>            GlobalReserve, single: total=512.00MiB, used=1.39MiB
> 
> "used=" was going up until it reached about 100MiB, then it was flushed to
> zero, and started again to fill, flush, fill, flush... some
> loop/buffer/cache (?).
> It convinced me, that after snapshot deletion btrfs is really working hard
> on cleanup.
> After some time "Metadata...used=" stopped changing, disks stopped working,
> server got lazy again.
> 
> 
> 
> I got back to main server. Started to watch "Metadata...used=". It was going
> down and down...
> I waited. When "Metadata...used=" stopped changing, then "iostat -m" stopped
> showing any disk activity.
> 
> I started "btrfs dev delete" again and now speed is 50Mb/sek. Hurrray!
> Problem solved.
> 
> 
> Sorry for not beeing clever enough to connect all this facts at the
> beginning.
> Anyway -- maybe in the future someone has the same problem, then btrfs
> experts could ask him if he let btrfs do some other hard work in the same
> time (eg cleaning up after massive snapsot deletion).
> 
> Maybe it would be usful to have a tool to ask btrfs "what are you doing? are
> you busy?".
> It would respond "I am cleaing up after snapshot deletion... I am
> balancing... I am scrubbing... I am resizing... I am deleting ...".

Usually 'top' or 'iotop' suffices for that.  btrfs-cleaner = deleting
snapshots, other activities will be tied to their respective userspace
agents.  The balance/delete/resize/drop-snapshot mutex is the only special
case that I know of where one btrfs maintenance thread waits for another.

It might be handy to give users a clue on snapshot delete, like add
"use btrfs sub list -d to monitor deletion progress, or btrfs sub sync
to wait for deletion to finish".

> 
> 
> 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux