apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Bannert <aa...@ebuilt.com>
Subject Re: lock benchmarks on Solaris 8/sparc uniprocessor
Date Tue, 31 Jul 2001 21:32:48 GMT
I've run some more tests with much higher concurrency (so far only on
my uniprocessor solaris 8/sparc machine, but from preliminary results
from Ian's 8-way sun box things only get worse with more CPUs). I've
tried to match some usage pattern from each of our major MPMs, described
here:

- prefork: one listener/worker per process, many processes.

- threaded: multiple listeners/workers per process, many processes but
            many less than prefork.

- worker: single listener, multiple workers per process, similiar number of
          processes to threaded.


I ran these three tests, each with 50 concurrent {threads,processes} that
each contend for a lock, increment a counter, and unlock; exiting after
the counter has reached 1 million:

pthread_mutex across threads:        18.5 sec
    -- applicable to threaded and worker

pthread_mutex across processes:      18.0 sec
    -- applicable to threaded, prefork, and worker

fcntl() across processes:            2790.2 sec (46.5 minutes!!)
    -- applicable to threaded, prefork, and worker


My interpretation of this is that the overhead incurred on acquiring
and releasing a lock 1 million times is somewhere around 2 orders of
magnitude more on fcntl() than the overhead for the same operation using
a posix mutex.  At first thought this may seem like an extreme case, but
given a high request-load there will be on the order of n LWPs waiting
on the same accept lock in both of the prefork and threaded MPMs (where
n is the number of processes * workers/process).

Given these results, it is clear to me that we should attempt to use
posix mutexes whenever possible (even moreso on large n-way machines,
as fcntl()'s exponential growth seems to increase it's ascent with each
new processor). This may only be true for Solaris (8/sparc), but I think
that in order to properly evaluate other platforms we'll need to run
similiar tests.

Would it be prudent for APR to provide a shared-memory implementation of
posix mutexes? It seems to me that we don't have to rely on PROCESS_SHARED
being available on a particular platform if we handle our own shared
memory allocation. Are there any known caveats to this type of an
implementation?

-aaron


Mime
View raw message