RE: controlling erasure code chunk size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you want 4k stripe_size, you have to configure the cauchy plugin with w=8 packetsize=128 for a k=4 configuration.

For w=(multiple of 8) we could probably skip the (*sizeof(int)) and get the chunksize factor 4 down ... Loic we should check if this is ok with the Jerasure implementation .... I wonder if we should have 'packetsize' as a plugin parameter or we should just adjust the packetsize based on the desired chunk_size to get it close.

Cheers Andreas.
________________________________________
From: Samuel Just [sam.just@xxxxxxxxxxx]
Sent: 02 February 2014 23:45
To: Andreas Joachim Peters
Cc: Loic Dachary; Ceph Development
Subject: Re: controlling erasure code chunk size

I assume we will use get_chunksize(desired_chunksize) *
get_data_chunk_count() on the mon to define the stripe width (the size
of the buffer which will be presented to the plugin for encoding) for
the pool.  At the moment, get_chunksize(4*(2<<10)) *
get_data_chunk_count() = 393216 using the jerasure plugin where
get_data_chunk_count() = 4.  This seems a bit big?
-Sam

On Sun, Feb 2, 2014 at 8:18 AM, Andreas Joachim Peters
<Andreas.Joachim.Peters@xxxxxxx> wrote:
> Hi Loic et.al.
>
> I think there is now some confusion about chunk_size, alignment, packetsize and the stripe_size to be used upstream.
>
> Algorithms with a bit-matrix require that the size per device is a multiple of (packetsize*w). Moreover the size per device and packetsize itself must be a multiple of sizeof(long/int). For other algorithms  you can assume the same with packetsize=1.
>
> packetsize and w influence  the performance and too small stripe_size on top will have negative performance effects due to the preparation of bufferlist, internal buffer checks and more loops to execute for the same amount of data. We can also do some measurement for this but the current benchmark would probably not reflect this, since it measures the algorithmic part not the bufferlist preparation part.
>
> If you want to define a stripe_size it has to be a multiple of the value returned by get_chunksize  and possibly it is a large multiple but in total not larger than processor caches. The plugin can not define the stripe_size, it defines only the alignment to be used for stripe_size and stripe_size is defined outside the plugin which maybe complicates the understanding. We should carefully check once more the Jerasure alignment requirements and our current implementation.
>
> To get rid of the platform dependency we could put a generic alignment requirement that chunksize has to be also 64-byte aligned.
>
> Cheers Andreas.
>
>
>
>
> ________________________________________
> From: Loic Dachary [loic@xxxxxxxxxxx]
> Sent: 02 February 2014 16:15
> To: Samuel Just
> Cc: Ceph Development; Andreas Joachim Peters
> Subject: controlling erasure code chunk size
>
> [cc' ceph-devel]
>
> Hi Sam,
>
> Here is how chunks are expected to be aligned:
>
> https://github.com/ceph/ceph/blob/4c4e1d0d470beba7690d1c0e39bfd1146a25f465/src/osd/ErasureCodePluginJerasure/ErasureCodeJerasure.cc#L365
>
>  unsigned alignment = k*w*packetsize*sizeof(int);
>   if ( ((w*packetsize*sizeof(int))%LARGEST_VECTOR_WORDSIZE) )
>     alignment = k*w*packetsize*LARGEST_VECTOR_WORDSIZE;
>   return alignment;
>
> If you are going to encode small objects, it may very well lead to oversized chunks if packetsize is large. At the moment the default is 3072
>
> https://github.com/ceph/ceph/blob/4c4e1d0d470beba7690d1c0e39bfd1146a25f465/src/common/config_opts.h#L406
>
> A value I picked when experimenting with 1MB objects encoding ( http://dachary.org/?p=2594 ).
>
> I'm not entirely sure why the alignment is calculated the way it is. Andreas certainly has a better understanding on this topic.
>
> Cheers
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux