stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Sebor <>
Subject Re: atomic tests timing out on Windows
Date Thu, 26 Jul 2007 16:49:46 GMT
Farid Zaripov wrote:
>> -----Original Message-----
>> From: Martin Sebor [] On Behalf Of Martin Sebor
>> Sent: Thursday, July 26, 2007 6:17 AM
>> To:
>> Subject: Re: atomic tests timing out on Windows
>>>   That tests works fine, but slow. This is because of using 
>> critical 
>>> section for synchronization for all types except 
>> signed/unsigned int 
>>> and long for which are used InterlockedXXX functions.
>> That doesn't explain why the test runs so much faster in 
>> other builds on the same architecture (x86). Even with gcc on 
>> CygWin it runs to completion, as well as with MSVC on Windows 2003.
>> All of these complete in under 30 seconds.
>   The test on gcc uses pthreads, which could be implemented without
> using system critical sections.
>>>   For example on my computer the one execution of the 
>> run_test<> takes 
>>> about 25 seconds when used critical section and only 3.5 
>> seconds when 
>>> used InterlockedXXX functions. The full test takes 445 
>> seconds. And a 
>>> big strange is that CPU load only ~40% during the test.
>> Is it a 2 CPU or dual core machine? If so, that might explain 
>> (some of) it. The CPU must wait for for the other one updates 
>> the variable.
>   It's a 1.5 CPU machine :) (Pentium4 with HT).
>   And seems that the timing out problem in HT enabled.
>   I have played with atomic_xchg test:
> Test1      Test2      Test3
> HT disabled:                                                        9735
> 9765       9765
> HT enabled, process affinity mask = 3 (default):      202250       -
> -
> HT enabled, process affinity mask = 1:                  10625      10750
> 10782
> HT enabled, process affinity mask = 2:                  10062      10047
> 10047


As I understand the technology, the benefit of hyperthreading is in
the processor's ability to make use of its idle circuits while other
circuits are busy doing things. But because a HT processor does not
actually duplicate most of ordinary processor's circuits (I've seen
5% being tossed around as the increase in the number of transistors
between an ordinary CPU and one with HT) it means that while one
thread that does FP processing can execute in parallel with another
that does integer arithmetic, two threads that are doing the same
thing cannot actually run simultaneously.

In our case, since both threads do exactly the same thing, trying
to run them on their own virtual "processors" (emulated by HT) must
make them run essentially serially just as they would on an ordinary
uniprocessor, and the scheduling overhead involved in the OS and CPU
switching between the two threads must actually account for most of
time spent by the process.

If my analysis is correct, we should avoid scheduling the threads
on multiple logical CPUs on HT systems with a single physical CPU.
Do you agree?


>   The numbers is the time in milliseconds.
>   And if you look into the night tests results: the all platforms where
> atomic_xxx tests
> are timed out has Pentium4 with HT processor.
> Farid.

View raw message