On 10/22/2012 02:55 PM, Josef Bacik wrote:
> On Mon, Oct 22, 2012 at 10:14:58AM -0600, Alex Elder wrote:
>> On 10/19/2012 04:01 PM, Josef Bacik wrote:
>>> Alex reported a problem where we were writing between chunks on a rbd
>>> device. The thing is we do bio_add_page using logical offsets, but the
>>> physical offset may be different. So when we map the bio now check to see
>>> if the bio is still ok with the physical offset, and if it is not split the
>>> bio up and redo the bio_add_page with the physical sector. This fixes the
>>> problem for Alex and doesn't affect performance in the normal case. Thanks,
>>>
>>> Reported-and-tested-by: Alex Elder <elder@xxxxxxxxxxx>
>>> Signed-off-by: Josef Bacik <jbacik@xxxxxxxxxxxx>
>>> ---
>>> fs/btrfs/volumes.c | 159 ++++++++++++++++++++++++++++++++++++++++++---------
>>> 1 files changed, 131 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index a8adf26..eaaf0bf 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -4217,6 +4217,113 @@ static noinline void schedule_bio(struct btrfs_root *root,
>>> &device->work);
>>> }
>>>
>>> +static int bio_size_ok(struct block_device *bdev, struct bio *bio,
>>> + sector_t sector)
>>> +{
>>> + struct bio_vec *prev;
>>> + struct request_queue *q = bdev_get_queue(bdev);
>>> + unsigned short max_sectors = queue_max_sectors(q);
>>> + struct bvec_merge_data bvm = {
>>> + .bi_bdev = bdev,
>>> + .bi_sector = sector,
>>> + .bi_rw = bio->bi_rw,
>>> + };
>>> +
>>> + if (bio->bi_vcnt == 0) {
>>> + WARN_ON(1);
>>> + return 1;
>>> + }
>>> +
>>> + prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
>>> + if ((bio->bi_size >> 9) > max_sectors)
>>> + return 0;
>>> +
>>> + if (!q->merge_bvec_fn)
>>> + return 1;
>>> +
>>> + bvm.bi_size = bio->bi_size - prev->bv_len;
>>> + if (q->merge_bvec_fn(q, &bvm, prev) < prev->bv_len)
>>> + return 0;
>>
>> I wanted to mention one concern that occurred to me over the
>> weekend.
>>
>> By checking only the last bio_vec in the bio here it's
>> conceivable you'll miss another boundary that sits earlier
>> in the bio.
>>
>> We technically allow the boundaries of objects that back rbd
>> disk images to be as small as 4 KB (but by default it's 4 MB).
>> 4 KB is really unlikely, but 256 KB (which is smaller than
>> 128-entry bio) is less of a stretch.
>>
>> I also came up with a different way of splitting that makes
>> it moot anyway, obviating the need for doing this check at
>> all so after a little more testing I'll get that posted.
>>
>
> Right but if we go over the boundary earlier in the bio shouldn't we trip the
> test anyway? For example, if you have 4k images and the last one happens to be
> 4k aligned but is in a 8 megabyte bio it should fail, and if it doesn't the
> merge function is wrong :). Once it fails we add all pages starting from the
> first one forward, so we will end up with correct bios if we have to split it.
> Thanks,
You're right. I finally took a closer look. You're passing
a bio, and the decision is based on whether, considering the
sector start address provided, adding the last segment in that
bio will cross a boundary. So if there were another boundary
prior to that in the bio, it too would be crossed and therefore
this check still works.
Thanks for the explanation.
-Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html