On 10.09.19 г. 6:32 ч., webmaster@xxxxxxxxx wrote: > > Quoting Qu Wenruo <quwenruo.btrfs@xxxxxxx>: > >> On 2019/9/10 上午9:24, webmaster@xxxxxxxxx wrote: >>> >>> Quoting Qu Wenruo <quwenruo.btrfs@xxxxxxx>: >>> >>>>>> Btrfs defrag works by creating new extents containing the old data. >>>>>> >>>>>> So if btrfs decides to defrag, no old extents will be used. >>>>>> It will all be new extents. >>>>>> >>>>>> That's why your proposal is freaking strange here. >>>>> >>>>> Ok, but: can the NEW extents still be shared? >>>> >>>> Can only be shared by reflink. >>>> Not automatically, so if btrfs decides to defrag, it will not be shared >>>> at all. >>>> >>>>> If you had an extent E88 >>>>> shared by 4 files in different subvolumes, can it be copied to another >>>>> place and still be shared by the original 4 files? >>>> >>>> Not for current btrfs. >>>> >>>>> I guess that the >>>>> answer is YES. And, that's the only requirement for a good defrag >>>>> algorithm that doesn't shrink free space. >>>> >>>> We may go that direction. >>>> >>>> The biggest burden here is, btrfs needs to do expensive full-backref >>>> walk to determine how many files are referring to this extent. >>>> And then change them all to refer to the new extent. >>> >>> YES! That! Exactly THAT. That is what needs to be done. >>> >>> I mean, you just create an (perhaps associative) array which links an >>> extent (the array index contains the extent ID) to all the files that >>> reference that extent. >> >> You're exactly in the pitfall of btrfs backref walk. >> >> For btrfs, it's definitely not an easy work to do backref walk. >> btrfs uses hidden backref, that means, under most case, one extent >> shared by 1000 snapshots, in extent tree (shows the backref) it can >> completely be possible to only have one ref, for the initial subvolume. >> >> For btrfs, you need to walk up the tree to find how it's shared. >> >> It has to be done like that, that's why we call it backref-*walk*. >> >> E.g >> A (subvol 257) B (Subvol 258, snapshot of 257) >> | \ / | >> | X | >> | / \ | >> C D >> / \ / \ >> E F G H >> >> In extent tree, E is only referred by subvol 257. >> While C has two referencers, 257 and 258. >> >> So in reality, you need to: >> 1) Do a tree search from subvol 257 >> You got a path, E -> C -> A >> 2) Check each node to see if it's shared. >> E is only referred by C, no extra referencer. >> C is refered by two new tree blocks, A and B. >> A is refered by subvol 257. >> B is refered by subvol 258. >> So E is shared by 257 and 258. >> >> Now, you see how things would go mad, for each extent you must go that >> way to determine the real owner of each extent, not to mention we can >> have at most 8 levels, tree blocks at level 0~7 can all be shared. >> >> If it's shared by 1000 subvolumes, hope you had a good day then. > > Ok, let's do just this issue for the time being. One issue at a time. It > will be easier. > > The solution is to temporarily create a copy of the entire backref-tree > in memory. To create this copy, you just do a preorder depth-first > traversal following only forward references. > > So this preorder depth-first traversal would visit the nodes in the > following order: > A,C,E,F,D,G,H,B > > Oh, it is not a tree, it is a DAG in that example of yours. OK, preorder > is possible on DAG, too. But how did you get a DAG, shouldn't it be all > trees? > > When you have the entire backref-tree (backref-DAG?) in memory, doing a > backref-walk is a piece of cake. > > Of course, this in-memory backref tree has to be kept in sync with the > filesystem, that is it has to be updated whenever there is a write to > disk. That's not so hard. Great, now that you have devised a solution and have plenty of experience writing code why not try and contribute to btrfs?
