On Wed, Sep 11, 2019 at 04:01:01PM -0400, webmaster@xxxxxxxxx wrote: > > Quoting "Austin S. Hemmelgarn" <ahferroin7@xxxxxxxxx>: > > > On 2019-09-11 13:20, webmaster@xxxxxxxxx wrote: > > > > > > Quoting "Austin S. Hemmelgarn" <ahferroin7@xxxxxxxxx>: > > > > > > > On 2019-09-10 19:32, webmaster@xxxxxxxxx wrote: > > > > > > > > > > Quoting "Austin S. Hemmelgarn" <ahferroin7@xxxxxxxxx>: > > > > > > > > > > > > > > > > > > === I CHALLENGE you and anyone else on this mailing list: === > > > > > > > > > > - Show me an exaple where splitting an extent requires > > > > > unsharing, and this split is needed to defrag. > > > > > > > > > > Make it clear, write it yourself, I don't want any machine-made outputs. > > > > > > > > > Start with the above comment about all writes unsharing the > > > > region being written to. > > > > > > > > Now, extrapolating from there: > > > > > > > > Assume you have two files, A and B, each consisting of 64 > > > > filesystem blocks in single shared extent. Now assume somebody > > > > writes a few bytes to the middle of file B, right around the > > > > boundary between blocks 31 and 32, and that you get similar > > > > writes to file A straddling blocks 14-15 and 47-48. > > > > > > > > After all of that, file A will be 5 extents: > > > > > > > > * A reflink to blocks 0-13 of the original extent. > > > > * A single isolated extent consisting of the new blocks 14-15 > > > > * A reflink to blocks 16-46 of the original extent. > > > > * A single isolated extent consisting of the new blocks 47-48 > > > > * A reflink to blocks 49-63 of the original extent. > > > > > > > > And file B will be 3 extents: > > > > > > > > * A reflink to blocks 0-30 of the original extent. > > > > * A single isolated extent consisting of the new blocks 31-32. > > > > * A reflink to blocks 32-63 of the original extent. > > > > > > > > Note that there are a total of four contiguous sequences of > > > > blocks that are common between both files: > > > > > > > > * 0-13 > > > > * 16-30 > > > > * 32-46 > > > > * 49-63 > > > > > > > > There is no way to completely defragment either file without > > > > splitting the original extent (which is still there, just not > > > > fully referenced by either file) unless you rewrite the whole > > > > file to a new single extent (which would, of course, completely > > > > unshare the whole file). In fact, if you want to ensure that > > > > those shared regions stay reflinked, there's no way to > > > > defragment either file without _increasing_ the number of > > > > extents in that file (either file would need 7 extents to > > > > properly share only those 4 regions), and even then only one of > > > > the files could be fully defragmented. > > > > > > > > Such a situation generally won't happen if you're just dealing > > > > with read-only snapshots, but is not unusual when dealing with > > > > regular files that are reflinked (which is not an uncommon > > > > situation on some systems, as a lot of people have `cp` aliased > > > > to reflink things whenever possible). > > > > > > Well, thank you very much for writing this example. Your example is > > > certainly not minimal, as it seems to me that one write to the file > > > A and one write to file B would be sufficient to prove your point, > > > so there we have one extra write in the example, but that's OK. > > > > > > Your example proves that I was wrong. I admit: it is impossible to > > > perfectly defrag one subvolume (in the way I imagined it should be > > > done). > > > Why? Because, as in your example, there can be files within a SINGLE > > > subvolume which share their extents with each other. I didn't > > > consider such a case. > > > > > > On the other hand, I judge this issue to be mostly irrelevant. Why? > > > Because most of the file sharing will be between subvolumes, not > > > within a subvolume. > > > Not necessarily. Even ignoring the case of data deduplication (which > > needs to be considered if you care at all about enterprise usage, and is > > part of the whole point of using a CoW filesystem), there are existing > > applications that actively use reflinks, either directly or indirectly > > (via things like the `copy_file_range` system call), and the number of > > such applications is growing. > > The same argument goes here: If data-deduplication was performed, then the > user has specifically requested it. > Therefore, since it was user's will, the defrag has to honor it, and so the > defrag must not unshare deduplicated extents because the user wants them > shared. This might prevent a perfect defrag, but that is exactly what the > user has requested, either directly or indirectly, by some policy he has > choosen. > > If an application actively creates reflinked-copies, then we can assume it > does so according to user's will, therefore it is also a command by user and > defrag should honor it by not unsharing and by being imperfect. > > Now, you might point out that, in case of data-deduplication, we now have a > case where most sharing might be within-subvolume, invalidating my assertion > that most sharing will be between-subvolumes. But this is an invalid (more > precisely, irelevant) argument. Why? Because the defrag operation has to > focus on doing what it can do, while honoring user's will. All > within-subvolume sharing is user-requested, therefore it cannot be part of > the argument to unshare. > > You can't both perfectly defrag and honor deduplication. Therefore, the > defrag has to do the best possible thing while still honoring user's will. > <<<!!! So, the fact that the deduplication was performed is actually the > reason FOR not unsharing, not against it, as you made it look in that > paragraph. !!!>>> IMHO the current kernel 'defrag' API shouldn't be used any more. We need a tool that handles dedupe and defrag at the same time, for precisely this reason: currently the two operations have no knowledge of each other and duplicate or reverse each others work. You don't need to defrag an extent if you can find a duplicate, and you don't want to use fragmented extents as dedupe sources. > If the system unshares automatically after deduplication, then the user will > need to run deduplication again. Ridiculous! > > > > When a user creates a reflink to a file in the same subvolume, he is > > > willingly denying himself the assurance of a perfect defrag. > > > Because, as your example proves, if there are a few writes to BOTH > > > files, it gets impossible to defrag perfectly. So, if the user > > > creates such reflinks, it's his own whish and his own fault. > > > The same argument can be made about snapshots. It's an invalid argument > > in both cases though because it's not always the user who's creating the > > reflinks or snapshots. > > Um, I don't agree. > > 1) Actually, it is always the user who is creating reflinks, and snapshots, > too. Ultimately, it's always the user who does absolutely everything, > because a computer is supposed to be under his full control. But, in the > case of reflink-copies, this is even more true > because reflinks are not an essential feature for normal OS operation, at > least as far as today's OSes go. Every OS has to copy files around. Every OS > requires the copy operation. No current OS requires the reflinked-copy > operation in order to function. If we don't do reflinks all day, every day, our disks fill up in a matter of hours... > 2) A user can make any number of snapshots and subvolumes, but he can at any > time select one subvolume as a focus of the defrag operation, and that > subvolume can be perfectly defragmented without any unsharing (except that > the internal-reflinked files won't be perfectly defragmented). > Therefore, the snapshoting operation can never jeopardize a perfect defrag. > The user can make many snapshots without any fears (I'd say a total of 100 > snapshots at any point in time is a good and reasonable limit). > > > > Such situations will occur only in some specific circumstances: > > > a) when the user is reflinking manually > > > b) when a file is copied from one subvolume into a different file in > > > a different subvolume. > > > > > > The situation a) is unusual in normal use of the filesystem. Even > > > when it occurs, it is the explicit command given by the user, so he > > > should be willing to accept all the consequences, even the bad ones > > > like imperfect defrag. > > > > > > The situation b) is possible, but as far as I know copies are > > > currently not done that way in btrfs. There should probably be the > > > option to reflink-copy files fron another subvolume, that would be > > > good. > > > > > > But anyway, it doesn't matter. Because most of the sharing will be > > > between subvolumes, not within subvolume. So, if there is some > > > in-subvolume sharing, the defrag wont be 100% perfect, that a minor > > > point. Unimportant. > > > You're focusing too much on your own use case here. > > It's so easy to say that. But you really don't know. You might be wrong. I > might be the objective one, and you might be giving me some > groupthink-induced, badly thought out conclusions from years ago, which was > never rechecked because that's so hard to do. And then everybody just > repeats it and it becomes the truth. As Goebels said, if you repeat anything > enough times, it becomes the truth. > > > Not everybody uses snapshots, and there are many people who are using > > reflinks very actively within subvolumes, either for deduplication or > > because it saves time and space when dealing with multiple copies of > > mostly identical tress of files. > > Yes, I guess there are many such users. Doesn't matter. What you are > proposing is that the defrag should break all their reflinks and > deduplicated data they painstakingly created. Come on! > > Or, maybe the defrag should unshare to gain performance? Yes, but only WHEN > USER REQUESTS IT. So the defrag can unshare, > but only by request. Since this means that user is reversing his previous > command to not unshare, this has to be explicitly requested by the user, not > part of the default defrag operation. > > > > As mentioned in the previous email, we actually did have a (mostly) > > working reflink-aware defrag a few years back. It got removed because > > it had serious performance issues. Note that we're not talking a few > > seconds of extra time to defrag a full tree here, we're talking > > double-digit _minutes_ of extra time to defrag a moderate sized (low > > triple digit GB) subvolume with dozens of snapshots, _if you were lucky_ > > (if you weren't, you would be looking at potentially multiple _hours_ of > > runtime for the defrag). The performance scaled inversely proportionate > > to the number of reflinks involved and the total amount of data in the > > subvolume being defragmented, and was pretty bad even in the case of > > only a couple of snapshots. > > > > Ultimately, there are a couple of issues at play here: > > I'll reply to this in another post. This one is getting a bit too long. > >
