cycle-level benchmark on atom cpu showed typical pthread_mutex_lock
call dropping from ~120 cycles to ~90 cycles with this change. benefit
may vary with compiler options and version, but this optimization is
very cheap to make and should always help some.
int pthread_mutex_lock(pthread_mutex_t *m)
{
int r;
+
+ if (m->_m_type == PTHREAD_MUTEX_NORMAL && !a_swap(&m->_m_lock, 1))
+ return 0;
+
while ((r=pthread_mutex_trylock(m)) == EBUSY) {
if (!(r=m->_m_lock) || (r&0x40000000)) continue;
if ((m->_m_type&3) == PTHREAD_MUTEX_ERRORCHECK