stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Black <abl...@roguewave.com>
Subject Re: incorrect runtimes
Date Thu, 21 Sep 2006 16:58:55 GMT
Greetings all.

During an offline brainstorming session, Martin and I determined that 
the cause of this observed problem is a race condition when a child 
(monitored) process creates grandchild processes.  If a child process 
terminates prior to the grandchild process, it is possible for the call 
to getrusage() to occur prior to the termination of the grandchild 
process, causing the time consumed by that process to be deferred to the 
next call to getrusage(), and the time to be merged with that of the 
following process.

Attached are some possible solutions I've put together for this issue.
With solution A, I receive a usable status code on Linux, along with an 
account of the amount of user/system time spent in the child process 
(but not the grandchild processes).  On Solaris, I receive an ERROR 
status code, which may be caused by an inability of the Solaris waitpid 
syscall to wait on process groups.

With solution B, the status code on Linux is inaccurate, though the user 
and system times appear to reflect that of the child process (again 
excluding the grandchild processes).  On Solaris, I observe the same 
behavior as in solution A, for the same probable reason.

With solution C, the status code on Linux is accurate, though the user 
and system times appear to be inaccurate.  On Solaris, the status and 
user/system times appear to be correct.

At this point, I believe that solution C is the best choice.  If one of 
these patches is to be applied, the following change log could be used.

Log:
	* exec.cpp [!_WIN32 && !_WIN64] (wait_for_child): Resolve race 
condition resulting in incorrect child process times.

--Andrew Black

Martin Sebor wrote:
> Yo Andrew,
> 
> I have another issue with the reported times (this one seems real).
> The times reported for a killed process are all zero and the time
> seems to be added to the next process that runs after it. I noticed
> it while running the locale tests:
> 
> $ make run
> NAME                      STATUS ASSERTS FAILED PERCNT    USER     SYS
> sanity_test.sh                 0      46      0   100%   0.380   0.880
> af_ZA.ISO-8859-1.sh            0       7      0   100%  39.570   1.110
> ar_AE.ISO-8859-6.sh            0       7      0   100%  39.000   1.250
> ar_BH.ISO-8859-6.sh            0       7      0   100%  39.010   1.300
> ar_DZ.ISO-8859-6.sh            0       7      0   100%  38.970   1.220
> ar_EG.ISO-8859-6.sh            0       7      0   100%  39.010   1.350
> ar_IN.UTF-8.sh               HUP                         0.000   0.010
> ar_IQ.ISO-8859-6.sh            0       7      0   100% 217.890   2.200
> 
> Could you see what's going on?
> 
> Thanks
> Martin

Mime
View raw message