Re: New feature Idea

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Morey Roof <moreyroof@xxxxxxxxx> writes:

> I have been thinking about a new feature to start work on that I am
> interested in and I was hoping people could give me some feedback and
> ideas of how to tackle it.  Anyways, I want to create a data
> deduplication system that can work in two different modes.  One mode
> is that when the system is idle or not beyond a set load point a
> background process would scan the volume for duplicate blocks.  The
> other mode would be used for systems that are nearline or backup
> systems that don't really care about the performance and it would do
> the deduplication during block allocation.

Seems like a special case of compression? Perhaps compression would help
more?

> One of the ways I was thinking of to find the duplicate blocks would
> be to use the checksums as a quick compare.  If the checksums match
> then do a complete compare before adjusting the nodes on the files.
> However, I believe that I will need to create a tree based on the
> checksum values.

If you really want to do deduplication: It might be advantageous to do 
this on larger units.

If you assume that data is usually shared between similar files (which
is a reasonable assumption) and do the deduplication on whole files
you can also use the size as an index and avoid checksumming all files
with a unique size.  I wrote a user level duplicated file checker some
time ago that used this trick successfully.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux