Mailing-List: contact dev-help@apr.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 14 Sep 2001 18:33:47 -0700
From: Justin Erenkrantz <jerenkrantz@ebuilt.com>
To: Aaron Bannert <aaron@clove.org>
Cc: dev@apr.apache.org
Subject: Re: [proposal] apr_thread_setconcurrency()
Message-ID: <20010914183347.K12417@ebuilt.com>
References: <20010914154448.V11014@clove.org>
 <20010914154959.I12417@ebuilt.com> <20010914162151.Z11014@clove.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010914162151.Z11014@clove.org>;
 from aaron@clove.org on Fri, Sep 14, 2001 at 04:21:51PM -0700

On Fri, Sep 14, 2001 at 04:21:51PM -0700, Aaron Bannert wrote:
> Why would this circumvent the OS scheduler at all? In all cases it
> is a *hint*. Please be more precise.
> 
> I think I showed you an example awhile ago where compute-bound threads
> behave drastically different depending on the operating system. In
> the case of solaris, a computationally intensive thread that makes no
> system calls* will not automatically yield a mutex when entering/exiting
> a critical section, unless pthread_setconcurrency() is called.

That statement isn't necessarily correct.  What actually happens
is that the user scheduler in Solaris never gets executed because
none of the entry points as defined by the OS (i.e. system calls)
get executed to trigger the user scheduler's activity during the
compute-bound function call.  It isn't that it doesn't yield the
mutex - it is that there is no other thread to yield to as the 
scheduler on Solaris gives the thread a chance to run *before*
launching the next thread.  This is a conscious decision on Sun's 
part when designing their scheduler for Solaris (up to but not 
including 9).

If you create too many LWPs, you will lose a lot of optimizations 
that are present in Solaris (i.e. handover of a mutex to another 
thread in the same LWP - as discussed with bpane on dev@httpd 
recently).  If you don't create enough LWPs, you may enter a 
condition where the scheduler refuses to balance the processes 
correctly (it also acts as a ceiling).  0 lets the OS determine
the concurrency (on Solaris).

By setting a value, you are attempting to circumvent the OS 
scheduler.  If you ask it to set the concurrency on Solaris, it 
*will* create enough LWPs to equal that concurrency (as you
create threads to be paired with LWPs).  This is not a hint, but a 
command.  (Yes, the man page for Solaris says that it is a hint, 
but it treats it as a command.)

Talking about other OSes besides Solaris is moot because they don't 
implement a M*N scheduling strategy.  With a bound thread 
implementation, pthread_setconcurrency is a no-op (what else can
it do?).  It can only be effective in the case of a LWP-like 
(multiplexing a kernel thread) scheduling strategy.

Furthermore, I think that any values that you may pass into
pthread_setconcurrency are inherently wrong.  What values will
you use to set this?  The number of threads?  The number of CPUs?
Let the programmer decide?  Let the user decide?  IMHO, all of these 
are bad choices:

- Use number of threads.  When concerning ourselves with the
Solaris M*N scheduler, this is horrific because we have now
lost the optimizations and may have created too many LWPs.  When you
use a bound thread library on Solaris, the overhead of the (now)
useless optimizations don't occur.  So, if you want to use the number 
of threads on Solaris, use the bound thread library instead of the 
LWP thread library.  This obviates the need for pthread_setconcurrency,
since by definition all threads are kernel threads.

- Use number of CPUs.  How would you get this number?  Also, it
is a bit of a red herring because it is not a good number because
your application may be sharing resources with other processes.
If you are primarily I/O-bound, you have just created too many
LWPs and have to incur their overhead because most of the time
the threads are going to be idle waiting for IO.

- Let the programmer decide.  Awfully bad choice.  Who knows
how the system is setup?  What are you optimizing for?

- Let the user decide via a configuration option (like a MPM
directive).  I don't think that we can expect the user to 
fully understand the meaning of this value.  More often then not,
they may set it to either of the wrong values described above.

So, what do I think the correct solution is?  Let the OS decide
(exactly what it does now).  The OS has access to much better 
information to make these decisions (i.e. load averages, I/O wait, 
other processes, num CPUs, etc.).  The goal of the OS is to balance 
competing processes.  Circumventing the OS scheduler by forcing it 
to create too many or too few LWPs is the wrong thing.

The case of a compute-bound thread merely falls into a specific 
trap on a specific OS with a specific thread model.  This case
is typically evident in benchmarks not the real-world.  Most
applications will enter a system call at *some* point.

> In a practical sense, when I was playing with the worker MPM I noticed
> that under high load (maxing out the CPU) it took on the order of 10
> seconds** for the number of LWPs to stablize.

I'll live with that - this is due to inherent OS scheduler 
characteristics.  After 10 seconds, the system stabilizes - the 
OS has performed its job.  Is there any evidence that this value
that it stabilized at is incorrect?  What formula would you have
used to set that number?  Any "hint" that we may give it may end 
up back-firing rather than helping.

In fact, the best solution may be to provide a configure-time 
option to help the user select the "right" thread model on
Solaris (i.e. /usr/lib/libthread.so or /usr/lib/lwp/libthread.so).
You can recommend using the "alternative" thread model for
certain types of compute-bound applications.  (However, be careful
on Solaris 9 as they are reversed.)  -- justin