Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





W dniu 03.01.2020 o 00:22, Chris Murphy pisze:
> On Thu, Jan 2, 2020 at 3:39 PM Leszek Dubiel <leszek@xxxxxxxxx> wrote:
>
> > This system could have a few million (!) of small files.
> > On reiserfs it takes about 40 minutes, to do "find /".
> > Rsync runs for 6 hours to backup data.
>
>
> There is a mount option:  max_inline=<bytes> which the man page says
> (default: min(2048, page size) )
>
> I've never used it, so in theory the max_inline byte size is 2KiB.
> However, I have seen substantially larger inline extents than 2KiB
> when using a nodesize larger than 16KiB at mkfs time.
>
> I've wondered whether it makes any difference for the "many small
> files" case to do more aggressive inlining of extents.
>
> I've seen with 16KiB leaf size, often small files that could be
> inlined, are instead put into a data block group, taking up a minimum
> 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect
> there just isn't enough room in that leaf to always use inline
> extents, and yet there is enough room to just reference it as a data
> block group extent. When using a larger node size, a larger percentage
> of small files ended up using inline extents. I'd expect this to be
> quite a bit more efficient, because it eliminates a time expensive (on
> HDD anyway) seek.

I will try that option when making new disks with BTRFS.
Then I'll report about efficiency.





> Another optimization, using compress=zstd:1, which is the lowest
> compression setting. That'll increase the chance a file can use inline
> extents, in particular with a larger nodesize.
>
> And still another optimization, at the expense of much more
> complexity, is LVM cache with an SSD. You'd have to pick a suitable
> policy for the workload, but I expect that if the iostat utilizations
> you see of often near max utilization in normal operation, you'll see
> improved performance. SSD's can handle way higher iops than HDD. But a
> lot of this optimization stuff is use case specific. I'm not even sure
> what your mean small file size is.



There is 11 million files:

root@gamma:/mnt/sdb1# find orion2 > listor2
root@gamma:/mnt/sdb1# ls -lt listor2
-rw-r--r-- 1 root root 988973729 sty  3 03:09 listor2
root@gamma:/mnt/sdb1# wc -l listor2
11329331 listor2


And df on reiserfs shows:

root@orion:~# df  -h -BM
System plików    1M-bl   used      avail %uż. zamont. na
/dev/md0        71522M  10353M   61169M  15% /
/dev/md1       905967M 731199M  174768M  81% /root

10353 + 731199 = 741552 M,

that is average file size is 741552 * 1000000 / 11000000 = 67413 bytes per file.
This estimation is not good, because df counts in blocks...

I will count more precisely with df --apparent-size.





>> # iotop -d30
>>
>> Total DISK READ:        34.12 M/s | Total DISK WRITE: 40.36 M/s
>> Current DISK READ:      34.12 M/s | Current DISK WRITE:      79.22 M/s
>>    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN IO> COMMAND
>>   4596 be/4 root       34.12 M/s   37.79 M/s  0.00 % 91.77 % btrfs
>
> Not so bad for many small file reads and writes with HDD. I've see
> this myself with single spindle when doing small file reads and
> writes.


So small files slow down in my case.
Ok! Thank you for the expertise.



PS. This morning:

root@wawel:~# btrfs bala stat /
Balance on '/' is running
1227 out of about 1231 chunks balanced (5390 considered),   0% left

So during the night it balanced  600Gb + 600Gb = 1.2Tb of
data in single profile to raid1 in about 12 hours. That is:

(600 + 600) * 1000 Mb/Gb / (12 hours * 3600 sec/hour)
      = (600 + 600) * 1000 / (12 × 3600)
            = 27 Mb/sec




root@wawel:~# btrfs dev usag /
/dev/sda2, ID: 2
   Device size:             5.45TiB
   Device slack:              0.00B
   Data,RAID1:              2.62TiB
   Metadata,RAID1:         22.00GiB
   Unallocated:             2.81TiB

/dev/sdb2, ID: 3
   Device size:             5.45TiB
   Device slack:              0.00B
   Data,RAID1:              2.62TiB
   Metadata,RAID1:         21.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             2.81TiB

/dev/sdc3, ID: 4
   Device size:            10.90TiB
   Device slack:            3.50KiB
   Data,RAID1:              5.24TiB
   Metadata,RAID1:         33.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             5.62TiB





root@wawel:~# iostat 10  -x
Linux 4.19.0-6-amd64 (wawel)     03.01.2020     _x86_64_    (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,00    0,00    0,00    0,00    0,00  100,00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util sda              0,00    0,00      0,00      0,00     0,00 0,00   0,00   0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00 sdb              0,00    0,00      0,00      0,00     0,00 0,00   0,00   0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00 sdc              0,00    0,00      0,00      0,00     0,00 0,00   0,00   0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,04    0,00    0,08    0,00    0,00   99,89

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util sda              0,00    0,00      0,00      0,00     0,00 0,00   0,00   0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00 sdb              0,00    0,00      0,00      0,00     0,00 0,00   0,00   0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00 sdc              0,00    0,00      0,00      0,00     0,00 0,00   0,00   0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00









[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux