On 12/23/2018 1:16 AM, Adam Borowski wrote:
On Sun, Dec 23, 2018 at 12:24:02AM +0000, Paul Jones wrote:
IMHO the more pertinent question is :
If a file has portions which are not easily compressible does that imply all
future writes are also incompressible. IMO no, so I think what will be prudent
is remove FORCE_COMPRESS altogether and make the code act as if it's
always on.
Any opinions?
That is a good idea. If I turn on compression I would expect everything
to be compressed, except in cases where there is no size benefit.
I expect that the vast majority of files consist of blocks of similar
compressibility. Thus, finding a block that fails to compress strongly
suggests other blocks are either incompressible as well or compress only
minimally. Refusing to waste time, electricity and fragmentation in such
case is a good default, I think.
But, if you believe this should be changed, there's an easy experiment you
can try: for all files on your filesystem, chop every file into 128KB pieces
and compress each of them with your chosen algorithm. Noting the compressed
size of every block in a file that had at least one block fail to compress
would give us some data.
I would suggest looking at Windows DLL files installed as part of a Wine
setup as a potential candidate for this. They tend to have very long
runs of null bytes scattered seemingly randomly throughout the file
(because hot patching, except you can't hot-patch DLL's reliably on
Windows) and use UTF-16 strings. As a result, the actual machine code
generally doesn't compress well, but most of the rest of the file does.
Fixed-size preallocated VM disk images would be another good candidate,
just wipe the free space with zeroes from the VM before testing them.
Realistically though, I see a couple of issues with the default behavior:
* There's no way for a regular user to figure out if a file actually is
transparently compressed or not.
* Without editing the filesystem directly, there's no way to
preemptively set the bit in metadata that tells BTRFS to not try to
compress a file, and there's no way to reset it either.
* The default behavior happens to be what `chattr +c` honors, which
leads to potentially unexpected behaviors some times (I, and most people
I know, would expect 'chattr +c' to behave like `compress-force`, not
`compress`).