On Sat, Oct 07, 2017 at 03:08:26AM +0300, Timofey Titovets wrote:
> > I hear the compiler scream, trying to optimize all the ifs. And I think
> > the CPU branch prediction would have a hard time, there's practically no
> > expected pattern and the performance could be worse compared to the
> > formula calculation.
> >
> > What's the reason you opencode it like that? Could you please post how
> > it would be implemented in C? There are potentially optimized functions
> > to do integer log2 (ilog2).
>
> That is just:
> log2_lshift4(n) { return ilog2(pow(n, 16)); }
> But that will cause a overflow.
>
> log2(8192) -> 13 - 13 bit long
> 13 * 4 = 52, 52 < 64
>
> That also restrict sample size to 15 bit
>
> So we can get approximately same behaviour,
> by doing like ilog2(pow(n, 4));
>
> https://pastebin.com/VryvqkCy - some numbers
> TL;DR version:
> pow(n, 4)
> - count 100 7 28300
> - error % 0.0% 1.1% 3.0%
> pow(n, 16)
> - count 18843 11483 14
> - error % 0.0% 1.0% 2.8%
>
> (I sample Win10.iso image as just big file with different data).
> I gen error value by:
> if (shannon_e_f > 0)
> error = 100 - (shannon_e_i * 1.0 * 100 / shannon_e_f);
>
> In fact that cause errors in entropy level like:
> TRUE -> Integer
> 83 -> 80
> 87 -> 84
> 32 -> 28
>
> In theory that possible to just make entropy level markers more aggressive,
> like: 70->65 and 85 -> 80
>
> That will cover precision errors.
With the v2 patch and direct calculation, I'm ok with leaving some
imprecision there. This is fine-tuning that can be done in parallel.
> or make a byte level BigInt in kernel (bigint_log2, bigint_Lshift & etc) %)
> (But this seems inexcusable to me + i'm unsure about performance + I'm
> afraid Big/Little.Endian)
I think we need to keep it fast and not anything too complex to
calculate, unless there's a significant win in the heuristic results.
> > Code like that could be microbenchmarked so we can see actual numbers,
> > so far I'm just guessing.
>
> I'm mentioned in a cover letter https://github.com/Nefelim4ag/companalyze
> (implement the heuristic with no optimizations, all code run, to get
> RAW counters)
> Over in memory bad compressible data:
> 8188. BSize: 131072, RepPattern: 0, BSet: 256, BCSet: 220, ShanEi%:
> 99| 99 ~0.0%, RndDist: 0, out: -3
>
> With -O0 - ~1100 MiB/s
> With -O2 - ~2500 MiB/s
>
> With (i found in kernel and add pow(n, 4))
> static int ilog2(uint64_t v)
> {
> int l = 0;
> v = v*v*v*v;
> while ((1UL << l) < v)
> l++;
The while() implementation is generic and works on all architectures,
but x86_64 has an instruction for that and will be even faster:
ilog2 ->
http://elixir.free-electrons.com/linux/latest/source/include/linux/log2.h#L26
fls (find lowest set bit) ->
http://elixir.free-electrons.com/linux/latest/source/arch/x86/include/asm/bitops.h#L450
bsrl
> return l;
> }
>
> With -O0 - ~1150 MiB/s
> With -O2 - ~2500 MiB/s
>
> So, my solutions looks more bad at code, than performance
> (i may be that because most requests for bad compressible data got
> directly to lookup table)
Ok, thanks for collecting the numbers.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html