On Jun 28, 2011, at 8:06 PM, Hugo Mills wrote: > On Tue, Jun 28, 2011 at 06:55:41PM +0100, João Eduardo Luís wrote: >> On Jun 28, 2011, at 4:07 PM, C Anthony Risinger wrote: >> >>> 2011/6/28 João Eduardo Luís <jecluis@xxxxxxxxx>: >>>> Hello. >>>> >>>> Can anyone think of a simple way to copy a set of pages from a given file (which may or may not be scattered throughout multiple extents) from a snapshot to correct pages within another file on another snapshot? >>>> >>>> This might sound silly, but the whole purpose is to create some sort of reconciliation method between divergent snapshots taken from the same original subvolume. >>> >>> generic deduplication? >>> >> >> I'm not sure if deduplication is what I'm looking for. >> >> What I actually want to achieve is to reconstruct a file's data from >> two diverging files. I.e., two snapshots are taken from the same >> subvolume and, in each snapshot, a given file A is written >> to. Assuming different blocks were written on, and no expected >> semantics are violated, what I aim to achieve is the correct >> reconciliation of file A in one of the snapshots. >> >> Maybe this could be achieved by using deduplication. I'll look into >> those patches. Even if they are not completely useful, they very >> well contain some neat concept that may be used to solve this little >> puzzle of mine. :-) > > You would need to enumerate the extents on each representation of > the file, picking the ones with the latest transid in each case. You > would then need to work out what the extents on the reconstructed file > would look like, and glue them all together into a new file. > In my case, I don't need to search the latest transid, since I keep an in-memory log of changes made within each snapshot. As these snapshots are ephemeral and created/destroyed on-demand by a user-level application, the associated cost of keeping such per-snapshot log doesn't seem to cause much impact on the performance. However, I log operations performed on a per-page basis. Glueing modified extents on each snapshot doesn't seem viable without deduplicating them first, or I may end up losing updates I did not intended to lose. On the other hand, I'm afraid the deduplication will lead to severe disk fragmentation when performed on a page-basis (e.g., if changes are made on several non-contiguous pages within several extents, in the same file on different snapshots, I would end up with several smaller extents scattered throughout disk). This is pretty much why I expected to be able to, literally, copy the changed pages from one snapshot to another, without deduplicating the extents. However, after spending the last couple of days looking for a simple way to do it, I now believe achieving this is far more complicated and prone to error (unless I missed something) than deduplicating the extents based on my logged information. Any thoughts would be helpful. --- João Eduardo Luís gpg key: 477C26E5 from pool.keyserver.eu
Attachment:
PGP.sig
Description: This is a digitally signed message part
