Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Message-Id: <5.1.0.14.2.20020610193114.0449bd08@localhost>
Date: Mon, 10 Jun 2002 19:52:42 +0300
To: Aaron Bannert <aaron@clove.org>
From: Zeev Suraski <zeev@zend.com>
Subject: Re: [PHP-DEV] RE: PHP profiling results under 2.0.37  Re:
  Performance of Apache 2.0 Filter
Cc: dev@httpd.apache.org,php-dev@lists.php.net
In-Reply-To: <20020610092958.R21255@clove.org>
References: <5.1.0.14.2.20020610113455.0480d5c8@localhost>
 <Pine.LNX.4.10.10206081449080.27619-100000@mail.zend.com>
 <5.1.0.14.2.20020610113455.0480d5c8@localhost>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 07:29 PM 6/10/2002, Aaron Bannert wrote:
>On Mon, Jun 10, 2002 at 11:46:46AM +0300, Zeev Suraski wrote:
> > What we need for efficient thread-safe operation is a mechanism like the
> > Win32 heaps - mutexless heaps, that provide malloc and free services on a
> > (preferably) contiguous pre-allocated block of memory.  The question is
> > whether the APR allocators fall into that category:
> >
> > 1.  Can you make them mutexless completely?  I.e., will they never call
> > malloc()?
>
>APR's pools only use malloc() as a portable way to retrieve large
>chunks of heapspace that are never returned. I don't know of any
>other portable way to do this.

There probably isn't.  Win32 heaps take advantage of the virtual memory 
functions (they pre-allocate the max heap size, but commit as necessary), 
and there's probably no portable way of doing that.

>In any case, at some level you will always have a mutex. Either you
>are mapping new segments in to the memory space of the process, or
>you are dealing with freelists in userspace.

I'm not sure if VirtualAlloc(..., MEM_COMMIT) results in a mutex, it will 
probably just not context-switch until it's over.  But you may be right.

> > 3.  As far as I can tell, they don't use a contiguous block of memory,
> > which means more fragmentation...
>
>I'm not sure how contiguity relates to fragmentation. With a pool
>you can do mallocs all day long, slowly accumulating more 8K blocks
>(which may or may not be contiguous). At the end of the pool lifetime
>(let's say, for example, at the end of a request) then those blocks
>are placed on a freelist, and the sub-partitions within those blocks
>are simply forgotten. On the next request, the process starts over again.

The fragmentation-related advantage of using a contiguous block is that PHP 
always ends up freeing ALL of the data in the heap in the end of every 
request.  So, you get to start with the same completely-free, contiguous 
block on every request.  However, if you don't have a contiguous block, and 
you use malloc() calls to satisfy certain allocation requests, any 
persistent malloc() which occurs during the request may end up being in the 
same area as your per-request allocations.  Then, even once you free all of 
the per-request blocks, you may no longer be able to allocate large chunks 
- because some persistent malloc()'s may be stuck in the middle.  Am I 
missing something here?

>I think to properly abstract a memory allocation scheme that can be
>implemented in a way that is optimized for the particular SAPI module,
>we'll have to abstract out a few concepts. This list is not exhaustive,
>but is just a quick sketch based on my understanding of Win32 heaps
>and APR pools:
>
>    - creation (called once per server lifetime)
>    - malloc (called many times per request)
>    - free (called many times per request)
>    - end-of-request (called many times per request)

(happens once per request)

>    - destruction (called once per serve lifetime)
>
>Does this cover all our bases?

There are also some persistent malloc's that happen during a request, and 
do not get freed at the end of the request.  But generally yes.


>  For example, when using pools, the
>free() call would do nothing, and the end-of-request call would simply
>call apr_pool_clear(). Note that this only applies to dynamically
>allocated memory required for the lifetime of a request. For memory
>with longer lifetimes we could make the creation and destruction
>routines more generic.

I know, but that's really not an option :)  This is how PHP/FI 2 used to 
work, and it had horrible memory performance.  We allocate and free *a lot* 
during a request.  We've worked very hard on freeing data as soon as we 
possibly can, so using the pool approach of cleaning everything at the 
end  of a request will reduce our memory performance radically.  What we 
currently have is a memory allocator that is quite suitable for the job - 
it supports malloc and free and it caches small blocks for reuse.  Still, 
under Windows, moving to a heap made a big difference - it eliminated the 
mutex overhead and reduced fragmentation.  The solution under UNIX may very 
well be increasing the block cache to work with larger blocks as well, and 
a larger number of blocks of each size - but this will definitely increase 
fragmentation big time :(

Zeev