On 3/28/12, Sebastian Huber <sebastian.huber@xxxxxxxxxxxxxxxxxx> wrote:
> I have a small test program for the new <stdatomic.h> features
> which should implement a simple spin lock:
>
> #include <stdatomic.h>
>
> void lock_int(atomic_int *x)
> {
> int expected = 0;
> _Bool success;
>
> do {
> success = atomic_compare_exchange_weak_explicit(
> x,
> &expected,
> 1,
> memory_order_acquire,
> memory_order_relaxed
> );
> } while (!success);
> }
The normal idiom for using compare-exchange, using your variable
names and memory orders, is:
expected = atomic_load_explicit( x, memory_order_relaxed );
do {
desired = function(expected);
} while (!atomic_compare_exchange_weak_explicit(x, &expected,
desired, memory_order_acquire, memory_order_relaxed));
The key thing to note is that when the comparison fails,
compare-exchange updates the value of *expected with the value found.
This update enables efficient and concise expression of computation
of the new value.
So, if in your function, the compare-exchange finds *x with a 1
the first iteration, it changes expected to 1, and it will likely
find it 1 on the second iteration. Then on the second iteration,
the compare-exchange will pass, even though you have not acquired
the lock. You need to reset expected on each iteration if you use
compare-exchange this way.
Some compilers can eliminate that reassignment.
I recommend that you do not code spin loops, they can cause really
bad behavior in all but very tightly controled environments.
>
> void unlock_int(atomic_int *x)
> {
> atomic_store_explicit(x, 0, memory_order_release);
> }
>
> void lock_flag(atomic_flag *x)
> {
> _Bool flag;
>
> do {
> flag = atomic_flag_test_and_set_explicit(
> x,
> memory_order_acquire
> );
> } while (flag);
> }
>
> void unlock_flag(atomic_flag *x)
> {
> atomic_flag_clear_explicit(x, memory_order_release);
> }
>
> Here is the generated assembler for PowerPC:
>
> .file "atomic.c"
> .section ".text"
> .align 2
> .globl lock_int
> .type lock_int, @function
> lock_int:
> stwu 1,-24(1)
> li 9,0
> li 8,1
> stw 9,8(1)
> .L3:
> lwarx 9,0,3
> lwz 10,8(1)
> cmpw 0,9,10
> bne- 0,.L2
> stwcx. 8,0,3
> isync
> .L2:
> stw 9,8(1)
> bne- 0,.L3
> addi 1,1,24
> blr
> .size lock_int, .-lock_int
> .align 2
> .globl unlock_int
> .type unlock_int, @function
> unlock_int:
> li 9,0
> sync
> stw 9,0(3)
> blr
> .size unlock_int, .-unlock_int
> .align 2
> .globl lock_flag
> .type lock_flag, @function
> lock_flag:
> rlwinm 5,3,3,27,28
> li 8,255
> xori 5,5,24
> li 11,1
> rlwinm 3,3,0,0,29
> li 4,0
> slw 8,8,5
> slw 11,11,5
> .L11:
> lbz 7,0(4)
> slw 7,7,5
> .L9:
> lwarx 9,0,3
> and 6,9,8
> andc 10,9,8
> cmpw 0,6,7
> or 10,10,11
> bne- 0,.L10
> stwcx. 10,0,3
> bne- 0,.L9
> .L10:
> isync
> srw 9,9,5
> stb 9,0(4)
> beq+ 0,.L11
> blr
> .size lock_flag, .-lock_flag
> .align 2
> .globl unlock_flag
> .type unlock_flag, @function
> unlock_flag:
> li 9,0
> sync
> stb 9,0(3)
> blr
> .size unlock_flag, .-unlock_flag
> .ident "GCC: (GNU) 4.7.0 20120307 (prerelease)"
>
> The unlock_int() and unlock_flag() look quite nice. Also
> lock_int() is quite good, only the lwz 10,8(1) inside the loop
> seems to be a bit suboptimal.
>
> The lock_flag() is surprisingly complicated. The problem seems
> to be that atomic_flag is a byte and word operations are used
> to load and store it (lwarx and stwcx). This requires a lot
> of bit operations. Why not use int as the type for atomic_flag
> on PowerPC?
I do not know why the implementors chose to do it that way, but
there is a time/space trade off.
--
Lawrence Crowl
[Linux C Programming]
[Linux Kernel]
[eCos]
[Fedora Development]
[Fedora Announce]
[Autoconf]
[The DWARVES Debugging Tools]
[Yosemite Campsites]
[Yosemite News]
[Linux GCC]