conceptually, a_spin needs to be at least a compiler barrier, so the
compiler will not optimize out loops (and the load on each iteration)
while spinning. it should also be a memory barrier, or the spinning
thread might keep spinning without noticing stores from other threads,
thus delaying for longer than it should.
ideally, an optimal a_spin implementation that avoids unnecessary
cache/memory contention should be chosen for each arch, but for now,
the easiest thing is to perform a useless a_cas on the calling
thread's stack.
static inline void a_spin()
{
+ __k_cas(&(int){0}, 0, 0));
}
static inline void a_crash()
static inline void a_spin()
{
+ a_cas(&(int){0}, 0, 0);
}
static inline void a_crash()
static inline void a_spin()
{
+ a_cas(&(int){0}, 0, 0);
}
static inline void a_crash()
static inline void a_spin()
{
+ a_cas(&(int){0}, 0, 0);
}
static inline void a_crash()
static inline void a_spin()
{
+ a_cas(&(int){0}, 0, 0);
}
static inline void a_crash()
static inline void a_spin()
{
+ a_cas(&(int){0}, 0, 0);
}
static inline void a_crash()