>> As for online dedupe (which seems useful for reducing writes), would it >> be useful if one could, given a write request, compare each of the >> dirty pages in that request against whatever else the fs has loaded in >> the page cache, and try to dedupe against that? We could probably >> speed up the search by storing hashes of whatever we have in the page >> cache and using that to find candidates for the memcmp() test. This of >> course is not a comprehensive solution, but (a) >> we combine it with offline dedupe later and (b) we don't make a disk >> write out data that we've recently read or written. Obviously you'd >> want to be able to opt-in to this sort of thing with an inode flag or >> something. > > That's another kettle of fish, and will require an entirely different > approach. ZFS has some experience doing that. While their implementation > may reduce writes it is at the cost of storing hashes of every block in > RAM. Though your proposal is quite different from the ZFS thing, and might actually be useful for a larger public, so forget I said anything about it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
