Re: BTRFS Deduplication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 11, 2017 at 2:55 PM, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>
>
> On 2017年09月11日 17:14, Qu Wenruo wrote:
>>
>>
>>
>> On 2017年09月11日 16:57, shally verma wrote:
>>>
>>> On Mon, Sep 11, 2017 at 1:42 PM, Qu Wenruo <quwenruo.btrfs@xxxxxxx>
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 2017年09月11日 15:54, shally verma wrote:
>>>>>
>>>>>
>>>>> On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.btrfs@xxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2017年09月11日 14:05, shally verma wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I was going through  BTRFS Deduplication page
>>>>>>> (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read
>>>>>>>
>>>>>>> "As such, xfs_io, is able to perform deduplication on a BTRFS file
>>>>>>> system," ..
>>>>>>>
>>>>>>> following this, I followed on to xfs_io link
>>>>>>> https://linux.die.net/man/8/xfs_io
>>>>>>>
>>>>>>> As I understand, these are set of commands allow us to do different
>>>>>>> operations on "xfs" filesystem.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Nope, it's just a tool triggering different read/write or ioctls.
>>>>>> In fact most of its command is fs independent.
>>>>>> Only a limited number of operations are only supported by XFS.
>>>>>>
>>>>>> It's just due to historical reasons it's still named as xfs_io.
>>>>>>
>>>>>> I won't be surprised if one day it's split as an independent tool.
>>>>>>
>>>>>>> and command set mentioned here, couldn't see which is command to
>>>>>>> invoke dedupe task.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> "dedupe" and "reflink" command.
>>>>>
>>>>>
>>>>> Oh. That means page link referred on BTRFS Wiki page is not updated
>>>>> with this. I googled another page that has reference of these two
>>>>> command in xfs_io here
>>>>> https://www.systutorials.com/docs/linux/man/8-xfs_io/
>>>>> May be Wiki need an update here.
>>>>
>>>>
>>>>
>>>> If XFS has a regularly updated online man page, we can just use that.
>>>> (But unfortunately, not every fs user tools use asciidoc like btrfs,
>>>> which
>>>> can generate both man page and html).
>>>>
>>>>>
>>>>>>
>>>>>>> and how this works with BTRFS.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use
>>>>>> it
>>>>>> to
>>>>>> determine if two ranges are containing identical data.
>>>>>>
>>>>>> And if they are identical, we use FICLONERANGE or
>>>>>> BTRFS_IOC_CLONE_RANGE
>>>>>> ioctl to reflink one to another, freeing one of them.
>>>>>>
>>>>>> BTW nowadays, such dedupe and reflink ioctl is genericized in VFS.
>>>>>> file_operations structure now includes both clone_file_range() and
>>>>>> dedupe_file_range() callbacks now.
>>>>>
>>>>>
>>>>> Yea. Understand that part. So going by description of "dedupe" and
>>>>> "reflink", seems through these commands, one can do deduplication part
>>>>> and NOT duplicate find part.
>>>>
>>>>
>>>>
>>>> Yes, one don't need to call "dedupe" ioctl if they already knows some
>>>> data
>>>> is identical and can go reflink straightforward.
>>>>
>>>>> That's still out of xfs_io command scope.
>>>>
>>>>
>>>>
>>>> Not sure what the scope here you mean, sorry for that.
>>>>
>>> By "scope", I meant duplicate find part but that contradicts statement
>>> you just written below:
>>>>
>>>> Since xfs_io can be used to find duplication,
>>>
>>>
>>> Since "dedupe" command input only a "source file" and src and
>>> dst_offset within that, so it can deduplicate the content within a
>>> file where actual FS dedupe IOCTL can first ensure if two extents are
>>> identical and if yes, then deduplicate them.
>>
>>
>> By "deduplicate", if you mean "removing duplication" then xfs_io "dedupe"
>> command itself doesn't do that.
>>
>> The old btrfs ioctl describe this better, FILE_EXTENT_SAME.
>> "dedupe" command itself is only verifying if they have the same content.
>>
>> So to make it clear, "dedupe" command and ioctl only do the *verification*
>> work.
>
>
> Sorry, I just checked the code and tried the ioctl.
>
> If they are the same, "dedupe" will do "reflink" part also.
>
> Code also shows that:
> ---
>         /* pass original length for comparison so we stay within i_size */
>         ret = btrfs_cmp_data(olen, &cmp);
>         if (ret == 0)
>                 ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> ---
>
> So "dedupe" ioctl itself can do de-duplication.
> And my previous answer is just totally wrong.
>
Yea. That corroborate my findings too. Thanks for confirming that :).

Thanks
Shally

> Sorry for that,
> Qu
>
>
>>
>> "Reflink" will really remove the duplication (or even non-duplicated data
>> if you really want).
>>
>>
>> But please be careful, "reflink" is much like copy, so it can be executed
>> on file ranges with different contents.
>> In that case, reflink can free some space, but it also modifies the
>> content.
>>
>> So for full de-duplication, one must go through the full *verify* then
>> *reflink* circle.
>> Although "dedupe"(FILE_EXTENT_SAME) ioctl provides one verification
>> method, it's not the only solution.
>>
>> But anyway, "dedupe" and "reflink" command provided by xfs_io does provide
>> every pieces to do de-duplication, so the wiki is still correct IMHO.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Is that correct?
>>>
>>> Thanks
>>> Shally
>>>
>>>   and can remove duplication, I
>>>>
>>>> don't find anything strange in that wiki page.
>>>> (Especially considering how popular the tool is, you can't find any more
>>>> handy tool than xfs_io)
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>
>>>>> Is that understanding correct?
>>>>> Thanks
>>>>> Shally
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Qu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So, can anyone help here and point me what am I missing here.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Shally
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> linux-btrfs"
>>>>>>> in
>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>>>> in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html





[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux