On 4/12/19 4:31 AM, waxhead wrote:
Anand Jain wrote:
On 12/3/19 7:27 AM, waxhead wrote:
Anand Jain wrote:
I imagine that RAIDc4 for example could potentially give a
grotesque speed increase for parallel read operations once BTRFS
learns to distribute reads to the device with the least waitqueue /
fastest devices.
That exactly was the objective of the Readmirror patch in the ML.
It proposed a framework to change the readmirror policy as needed.
Thanks, Anand
Indeed. If I remember correctly your patch allowed for deterministic
reading from certain devices.
It provides a framework to configure the readmirror policies. And the
policies can be based on io-depth, pid, or manual for heterogeneous
devices with different latency.
As just a regular btrfs user the problem I see with this is that you
loose a "potential free scrub" that *might* otherwise happen from
often read data. On the other hand that is what manual scrubbing is
for anyway.
Ha ha.
When it comes to data and its reliability and availability we need
guarantee and only deterministic approach can provide it.
Uhm , what I meant was that if someone sets a readmirror policy to read
from the fastest devices in for example RAID1 and a copy exists on both
a harddrive, and a SSD device and reads are served from the fastest
drive (SSD) then you will never get a "accidental" read on the harddrive
and therefore making scrubbing absolutely necessary (which it actually
is anyway).
In other words for sloppy sysadmins, if you read data often then the
hottest data is likely to have both copies read. If you set a policy
that prefer to always read from SSD's it could happen that the poor
harddrive is never "checked".
What you are asking for is to route the particular block to
the device which was not read before so to avoid scrubbing or to
make scrubbing more intelligent to scrub only old never read blocks
- this will be challenging we need to keep history of block and the
device it used for read - most likely a scope for the bpf based
external tools but definitely not with in kernel. With in kernel
we can create readmirror like framework so that external tool can
achieve it.
From what I remember from my prevous post (I am too lazy to look it up)
I was hoping that subvolumes could be assigned or "prioritized" to
certain devices e.g. device groups. Which means that you could put all
SSD's of a certain speed in one group, all harddrives in another group
and NVMe storage devices in yet another group. Or you could put all
devices on a certain JBOD controller board on it's own group. That way
BTRFS could have prioritized to store certain subvolumes on a certain
group and/or even allowing to migrate (balance) to antoher group. If you
run out of space you can always distribute across other groups and to
magic things there ;)
Not that I have anything against the readmirror policy , but I think
this approach would be a lot more ideal.
Yep. I remember [1] you brought subvolume to be able to direct the
read IO.
[1]
https://www.mail-archive.com/linux-btrfs@xxxxxxxxxxxxxxx/msg86467.html
The idea is indeed good. But its not possible to implement as we
share and link blocks across subvolumes and snapshots or it may
come with too many limitations and gets messy.
Thanks, Anand
Thanks, Anand