Re: [PATCH v3 0/3] Btrfs: populate heuristic with detection logic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2017-07-29 16:36 GMT+03:00 Timofey Titovets <nefelim4ag@xxxxxxxxx>:
> Based on kdave for-next
> As heuristic skeleton already merged
> Populate heuristic with basic code.
>
> First patch: add simple sampling code
> It's get 16 byte samples with 256 bytes shifts
> over input data. Collect info about how many
> different bytes (symbols) has been found in sample data
>
> Second patch: add code for calculate
> how many unique bytes has been
> found in sample data
> That can fast detect easy compressible data
>
> Third patch: add code for calculate byte core set size
> i.e. how many unique bytes use 90% of sample data
> That code require that numbers in bucket must be sorted
> That can detect easy compressible data with many repeated bytes
> That can detect not compressible data with evenly distributed bytes
>
> Changes v1 -> v2:
>   - Change input data iterator shift 512 -> 256
>   - Replace magic macro numbers with direct values
>   - Drop useless symbol population in bucket
>     as no one care about where and what symbol stored
>     in bucket at now
>
> Changes v2 -> v3 (only update #3 patch):
>   - Fix u64 division problem by use u32 for input_size
>   - Fix input size calculation start - end -> end - start
>   - Add missing sort.h header
>
> Timofey Titovets (3):
>   Btrfs: heuristic add simple sampling logic
>   Btrfs: heuristic add byte set calculation
>   Btrfs: heuristic add byte core set calculation
>
>  fs/btrfs/compression.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/compression.h |  13 ++++++
>  2 files changed, 120 insertions(+), 2 deletions(-)
>
> --
> 2.13.3

Hi, may be any thoughts on that patches? (i know you are busy)

---
small offtop:
I think that in future that will change:
from:
struct heuristic_bucket_item {
        u8  padding;
        u8  symbol;
        u16 count;
};

To:
struct heuristic_bucket_item {
        u32  symbol;
        u32 count;
};

This will cause some memory overhead (1024b -> 2048b (768b useless))
But that allow support *big* samples
At now max sample size 2^16-1b and heuristic usable only over 4KiB <->
1MiB-256b range (thats of course enough for 128KiB btrfs compression
window).
And that needed for aligned memory access =\
(if that needed at now of course)

Also, may be heuristic must use btrfs_compression workspaces?
I of course can't imagine performance difference on find_workspace()
vs kcalloc(), and heuristic safe to fail on memory allocation.
IMHO for using compression workspace (if i understand code correctly)
Heuristic code must move to external file (heuristic.c?) to correctly
avoid name clashes with struct workspace & etc
And may be for avoid code misunderstanding name refactoring of
workspace code needed,
because that created for compression workspaces and heuristic itself
is not compression

Thanks!

-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux