On 2019-02-15 14:50, Zygo Blaxell wrote:
On Fri, Feb 15, 2019 at 11:54:57AM -0500, Austin S. Hemmelgarn wrote:
On 2019-02-15 10:40, Brian B wrote:
It looks like the btrfs code currently uses the total space available on
a disk to determine where it should place the two copies of a file in
RAID1 mode. Wouldn't it make more sense to use the _percentage_ of free
space instead of the number of free bytes?
For example, I have two disks in my array that are 8 TB, plus an
assortment of 3,4, and 1 TB disks. With the current allocation code,
btrfs will use my two 8 TB drives exclusively until I've written 4 TB of
files, then it will start using the 4 TB disks, then eventually the 3,
and finally the 1 TB disks. If the code used a percentage figure
instead, it would spread the allocations much more evenly across the
drives, ideally spreading load and reducing drive wear.
Spreading load should make all the drives wear at the same rate (or a rate
proportional to size). That would be a gain for the big disks but a
loss for the smaller ones.
Is there a reason this is done this way, or is it just something that
hasn't had time for development?
It's simple to implement, easy to verify, runs fast, produces optimal or
near optimal space usage in pretty much all cases, and is highly
deterministic.
Using percentages reduces the simplicity, ease of verification, and speed
(division is still slow on most CPU's, and you need division for
percentages), and is likely to not be as deterministic (both because the
A few integer divides _per GB of writes_ is not going to matter.
raid5 profile does a 64-bit modulus operation on every stripe to locate
parity blocks.
It really depends on the system in question, and division is just the
_easy_ bit to point at being slower. Doing this right will likely need
FP work, which would make chunk allocations rather painfully slow.