|[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]|
Christian, Many thanks for your reply.
1) Does it scan blocks from the tail of the file system forward sequentially?Yes2) Does it reclaim blocks regardless of how dirty they are? Or does it execute reclaiming on order of maximum dirtyness first in order to reduce churn (and flash wear when used on flash media)?The former.3) What happens when it encounters a block that isn't dirty? Does it skip it and reclaim the next dirty block, leaving a "hole"? Or does it reclaim everything up to a reclaimable block to make the free space contiguous?It is cleaned regardless. Free space appears to always be contiguous.
Hmm, so the GC causes completely unnecessary flash wear. That's really bad for the most advantageous use-case of nilfs2. :(
4) Assuming this isn't already how it works, how difficult would it be to modify the reclaim policy (along with associated book-keeping requirements) to reclaim blocks in the order of dirtiest-block-first?5) If a suitable book-keeping bitmap was in place for 4), could this not be used for accurate df reporting?Not being a NILFS developer, I can't answer either of these in detail. However, as I understand it, the filesystem driver does not depend on the current cleaning policy, and can skip cleaning specific blocks should those blocks be sufficiently clean. Segments need not be written sequentially, as each segment contains a pointer to the next segment that will be written and hence why lssu always lists two segments as active (the current segment and the next segment to be written).
It's just that the current GC just cleans all segments sequentially. It's easier to just cycle through the segments in a circular fashion.
I see, so the sub-optimal reclaim and unnecessary churn are purely down to the userspace GC daemon?
Is there scope for having a bitmap or a counter in each allocation unit to show how many dirty blocks there are in it? Such a bitmap would require 1MB of space for every 32GB of storage (assuming 1 bit per 4KB block). This would allow for being able to tell at a glance which block is dirties and thus should be reclaimed next, while at the same time stopping unnecessary churn.
What would be useful is to be able to select the write segment into which the cleaner will write live data. That way, the system could maintain twolog "heads", one for active hot data, and one for inactive cold data. Then all cleaning would be done to the cold head, and all new writes to the hot head on the assumption that the new write will either be temporary (and hence discarded sooner rather than later) or not be updated for some time (and hence cleaned to a cold segment by the cleaner) with the hope thatwe'll have a bimodal distribution of clean and dirty data. Then the cleaner can concentrate on cleaning hot segments, with the occasional cleanof cold segments.
I don't think distinguishing between hot and cold data is all that useful. Ultimately, the optimal solution would be to reclaim the AUs in dirtiest-first order. The other throttling provisions (not reclaiming until free space drops below a threshold) should do enough to stop premature flash wear.
Accurate df reporting is more tricky, as checkpoints and snapshots make it decidedly not trivial to account for overwritten data. As such, the current df reporting is probably the best we can manage within the current constraints.
With the bitmap solution as described above, would we not be able to simply subtract the dirty blocks from the used space? Since the bitmap always contains the dirtyness information on all the blocks in the FS, this would make for a pretty simple solution, would it not?
Is there anything in place that would prevent such a bitmap from being kept in the file system headers? It could even be kept in RAM and generated by the garbage collector for it's own use at run-time, thinking about it, 1MB per 32GB is not a lot (32MB per TB), and it could even be run-length encoded.
Right now, even just preventing reallocation of allocation units that are completely clean would be a big advantage in terms of performance and flash wear.
Gordan -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html