On Thu, Jan 2, 2020 at 3:39 PM Leszek Dubiel <leszek@xxxxxxxxx> wrote: > > Almost no reads, all writes, but slow. And rather high write request > > per second, almost double for sdc. And sdc is near it's max > > utilization so it might be ear to its iops limit? > > > > ~210 rareq-sz = 210KiB is the average size of the read request for > sda and sdb > > > > Default mkfs and default mount options? Or other and if so what other? > > > > Many small files on this file system? Or possibly large files with a > > lot of fragmentation? > > Default mkfs and default mount options. > > This system could have a few million (!) of small files. > On reiserfs it takes about 40 minutes, to do "find /". > Rsync runs for 6 hours to backup data. There is a mount option: max_inline=<bytes> which the man page says (default: min(2048, page size) ) I've never used it, so in theory the max_inline byte size is 2KiB. However, I have seen substantially larger inline extents than 2KiB when using a nodesize larger than 16KiB at mkfs time. I've wondered whether it makes any difference for the "many small files" case to do more aggressive inlining of extents. I've seen with 16KiB leaf size, often small files that could be inlined, are instead put into a data block group, taking up a minimum 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect there just isn't enough room in that leaf to always use inline extents, and yet there is enough room to just reference it as a data block group extent. When using a larger node size, a larger percentage of small files ended up using inline extents. I'd expect this to be quite a bit more efficient, because it eliminates a time expensive (on HDD anyway) seek. Another optimization, using compress=zstd:1, which is the lowest compression setting. That'll increase the chance a file can use inline extents, in particular with a larger nodesize. And still another optimization, at the expense of much more complexity, is LVM cache with an SSD. You'd have to pick a suitable policy for the workload, but I expect that if the iostat utilizations you see of often near max utilization in normal operation, you'll see improved performance. SSD's can handle way higher iops than HDD. But a lot of this optimization stuff is use case specific. I'm not even sure what your mean small file size is. > # iotop -d30 > > Total DISK READ: 34.12 M/s | Total DISK WRITE: 40.36 M/s > Current DISK READ: 34.12 M/s | Current DISK WRITE: 79.22 M/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND > 4596 be/4 root 34.12 M/s 37.79 M/s 0.00 % 91.77 % btrfs Not so bad for many small file reads and writes with HDD. I've see this myself with single spindle when doing small file reads and writes. -- Chris Murphy
