Re: New feature Idea

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2008-08-13 at 22:00 +0200, Andi Kleen wrote:
> Morey Roof <moreyroof@xxxxxxxxx> writes:
> 
> > I have been thinking about a new feature to start work on that I am
> > interested in and I was hoping people could give me some feedback and
> > ideas of how to tackle it.  Anyways, I want to create a data
> > deduplication system that can work in two different modes.  One mode
> > is that when the system is idle or not beyond a set load point a
> > background process would scan the volume for duplicate blocks.  The
> > other mode would be used for systems that are nearline or backup
> > systems that don't really care about the performance and it would do
> > the deduplication during block allocation.
> 
> Seems like a special case of compression? Perhaps compression would help
> more?
> 
> > One of the ways I was thinking of to find the duplicate blocks would
> > be to use the checksums as a quick compare.  If the checksums match
> > then do a complete compare before adjusting the nodes on the files.
> > However, I believe that I will need to create a tree based on the
> > checksum values.
> 
> If you really want to do deduplication: It might be advantageous to do 
> this on larger units.
> 
> If you assume that data is usually shared between similar files (which
> is a reasonable assumption) and do the deduplication on whole files
> you can also use the size as an index and avoid checksumming all files
> with a unique size.  I wrote a user level duplicated file checker some
> time ago that used this trick successfully.
> 
> -Andi

I would like to use the tree in a similar fashion as the way snapshots
are handled.  Also, I want to catch blocks that exist in different
files.

Say, you have several Virtual Machine Disk files on the volume.  Those
virtual machine disk files may not be the same in terms of files but if
the are virtual machines that are both running the same operating system
then some of the blocks/extents are going to be the same and I want to
be able to dedup them.

-Morey

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux