Thomas Glanzmann schrieb:
Ric,
I would not categorize it as offline, but just not as inband (i.e., you can
run a low priority background process to handle dedup).
Offline windows are extremely rare in production sites these days and
it could take a very long time to do dedup at the block level over a
large file system :-)
let me rephrase, by offline I meant asynchronous during off hours.
Hi, during the last half year I thought a little bit about doing dedup
for my backup program: not only with fixed blocks (which is
implemented), but with moving blocks (with all offsets in a file: 1
byte, 2 byte, ...). That means, I have to have *lots* of comparisions
(size of file - blocksize). Even it's not the same, it must be very fast
and that's the same problem like the one discussed here.
My solution (not yet implemented) is as follows (hopefully I remember well):
I calculate a checksum of 24 bit. (there can be another size)
This means, I can have 2^24 different checksums.
Therefore, I hold a bit verctor of 0,5 GB in memory (I hope I remember
well, I'm just in a hotel and have no calculator): one bit for each
possibility. This verctor is initialized with zeros.
For each calculated checksum of a block, I set the according bit in the
bit vector.
It's very fast, to check if a block with a special checksum exists in
the filesystem (backup for me) by checking the appropriate bit in the
bit vector.
If it doesn't exist, it's a new block
If it exists, there need to be a separate 'real' check if it's really
the same block (which is slow, but's that's happening <<1% of the time).
I hope it is possible to understand my thoughts. I'm in a hotel and I
possibly cannot track the emails in this list in the next hours or days.
Regards, HJC
1/3 is not sufficient for dedup in my opinion - you can get that with
normal compression at the block level.
1/3 is what gives me real time data of an production environment in a
mixed VM setup without compression.
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html