On 2019-01-22 13:43, Anand Jain wrote:
On 01/21/2019 07:56 PM, Steven Davies wrote:
On 2018-05-16 11:03, Anand Jain wrote:
+ break;
+ }
I noticed that it's possible to pass this option multiple times at
mount, which sets multiple devices as read mirrors. While that doesn't
do anything harmful, the code below will only use the first device. It
may be worth at least mentioning this in documentation.
There were few feedback if read_mirror_policy should be a mount
option or a sysfs or a property. IMO property is better as it would
be persistent. In sysfs and mount-option, the user or a config file
has to remember. Will fix.
I agree. Would/could this be set at mkfs time? It would then be similar
to how mdadm's --write-mostly is set up.
+
ret = -EINVAL;
goto out;
case Opt_err:
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 64dba5c4cf33..3a9c3804ff17 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5220,6 +5220,16 @@ static int find_live_mirror(struct
btrfs_fs_info *fs_info,
num_stripes = map->num_stripes;
switch(fs_info->read_mirror_policy) {
+ case BTRFS_READ_MIRROR_BY_DEV:
+ preferred_mirror = first;
+ if (test_bit(BTRFS_DEV_STATE_READ_MIRROR,
+ &map->stripes[preferred_mirror].dev->dev_state))
+ break;
+ if (test_bit(BTRFS_DEV_STATE_READ_MIRROR,
+ &map->stripes[++preferred_mirror].dev->dev_state))
<--[*]
+ break;
+ preferred_mirror = first;
Why set preferred_mirror again? The only effect of re-setting it will
be to use the lowest devid (1) rather than the highest (2). Is there
any difference? Either way it should never happen, because the above
code traps it (except when the BTRFS_DEV_STATE_READ_MIRROR flag has
been changed as mentioned above).
Code at [*] above does ++preferred_mirror. So the following
preferred_mirror = first;
is not redundant.
Yes, but preferred_mirror will be first + num_stripes; meaning that
without that line the last stripe is preferred and with it the first
stripe is preferred. It only changes the fallback case from reading from
devid 2 to devid 1, which barely matters. Perhaps it would be even
better to fall back to pid % num_stripes in this case anyway?
From Goffredo's comment[1] on Timofey's similar effect patch, if it
becomes possible to have more mirrors in a RAID1/RAID10 fs then this
code will need to be updated to test dev_state for more than two
devices. Would it be sensible to implement this as a for loop straight
away?
Right. A loop is better, will add.
Thinking more about this, if it becomes possible to have more than two
devices in a RAID1 or part-RAID10 then do we also need to consider that
there may be more than one READ_MIRROR drive? e.g. slow+fast+fast drives
in a RAID1 with this approach would result in all chunks being read from
the first drive with READ_MIRROR set if both chunks are on the fast
drives. This is where Timofey's queue length patch in [1] would become
useful - in any case, that is an existing problem and probably deserving
of an entirely new patch.
[1]: https://patchwork.kernel.org/patch/10678531/#22315779
Thanks,
Anand.
Steve