Re: Snapshot reconciliation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jun 28, 2011, at 8:06 PM, Hugo Mills wrote:

> On Tue, Jun 28, 2011 at 06:55:41PM +0100, João Eduardo Luís wrote:
>> On Jun 28, 2011, at 4:07 PM, C Anthony Risinger wrote:
>> 
>>> 2011/6/28 João Eduardo Luís <jecluis@xxxxxxxxx>:
>>>> Hello.
>>>> 
>>>> Can anyone think of a simple way to copy a set of pages from a given file (which may or may not be scattered throughout multiple extents) from a snapshot to correct pages within another file on another snapshot?
>>>> 
>>>> This might sound silly, but the whole purpose is to create some sort of reconciliation method between divergent snapshots taken from the same original subvolume.
>>> 
>>> generic deduplication?
>>> 
>> 
>> I'm not sure if deduplication is what I'm looking for.
>> 
>> What I actually want to achieve is to reconstruct a file's data from
>> two diverging files. I.e., two snapshots are taken from the same
>> subvolume and, in each snapshot, a given file A is written
>> to. Assuming different blocks were written on, and no expected
>> semantics are violated, what I aim to achieve is the correct
>> reconciliation of file A in one of the snapshots.
>> 
>> Maybe this could be achieved by using deduplication. I'll look into
>> those patches. Even if they are not completely useful, they very
>> well contain some neat concept that may be used to solve this little
>> puzzle of mine. :-)
> 
>   You would need to enumerate the extents on each representation of
> the file, picking the ones with the latest transid in each case. You
> would then need to work out what the extents on the reconstructed file
> would look like, and glue them all together into a new file.
> 

In my case, I don't need to search the latest transid, since I keep an in-memory log of changes made within each snapshot. As these snapshots are ephemeral and created/destroyed on-demand by a user-level application, the associated cost of keeping such per-snapshot log doesn't seem to cause much impact on the performance.

However, I log operations performed on a per-page basis. Glueing modified extents on each snapshot doesn't seem viable without deduplicating them first, or I may end up losing updates I did not intended to lose.

On the other hand, I'm afraid the deduplication will lead to severe disk fragmentation when performed on a page-basis (e.g., if changes are made on several non-contiguous pages within several extents, in the same file on different snapshots, I would end up with several smaller extents scattered throughout disk).

This is pretty much why I expected to be able to, literally, copy the changed pages from one snapshot to another, without deduplicating the extents. However, after spending the last couple of days looking for a simple way to do it, I now believe achieving this is far more complicated and prone to error (unless I missed something) than deduplicating the extents based on my logged information.


Any thoughts would be helpful.

---
João Eduardo Luís
gpg key: 477C26E5 from pool.keyserver.eu 





Attachment: PGP.sig
Description: This is a digitally signed message part


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux