RFC: Compression - calculate entropy for data set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, for last several days i try work on entropy calculation that can
be usable in btrfs compression code (for detect bad compressible
data),

I've implemented:
- avg meaning (Problems with accuracy)
- shannon entropy
- shannon entropy with only integer logic (Accuracy compared to float
shannon +-0.5%)

All writen on C with C++ inserts and can be easy ported to kernel code
if needed.
Repo there:
https://github.com/Nefelim4ag/Entropy_Calculation

It will be great if someone has an interest in profiling and
performance tests of that

Because my stupid tests with ~$ time <binary> and 8MB of test data
Shows that lzo with level 1-6 are fastest way to detect if data are compressible
And that integer shannon entropy are much faster (in 5 times) way in
compare to any gzip level.

Thanks!

P.S.
I get this idea from:
https://btrfs.wiki.kernel.org/index.php/Project_ideas
 - Compression enhancements
    - heuristics -- try to learn in a simple way how well the file
data compress, or not

-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux