perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elizabeth Mattijsen <>
Subject Re: ithreads with modperl
Date Fri, 09 Jan 2004 22:29:54 GMT
At 13:26 -0800 1/9/04, Stas Bekman wrote:
>Elizabeth Mattijsen wrote:
>>I'm sure you know my PerlMonks article "Things yuu need to know 
>>before programming Perl ithreads" ( 
>> ).
>>So yes, in general I think you can say that the data copied for 
>>each thread, quickly dwarves whatever optrees are shared.
>How is this different from fork? When you fork, OS shares all memory 
>pages between the parent and the child. As variables are modified, 
>memory pages become dirty and unshared. With forking mutable (vars) 
>and non-mutable (OpCodes) share the same memory pages, so ones a 
>mutable variable changes, the opcodes allocated from the same memory 
>page, get unshared too. So you get more and more memory unshared as 
>you go. in the long run (unless you use size limiting tools) all the 
>memory gets unshared.

Well, yes.  But you forget that when you load module A, usually 
modules B..Z are loaded as well, hidden from your direct view.  And 
Perl has always taken the approach of using more memory rather than 
more CPU.  So most modules are actually optimized by their authors to 
store intermediate results in maybe not so intermediate variables. 
Not to mention, many modules build up internal data-structures that 
may never be altered.  Even compile time constants need to have a CV 
in the stash where they exist, even though they're optimized away in 
the optree at compile time.  And a CV qualifies as "data" as far as 
threads are concerned.

>With ithreads, opcode tree is always shared and mutable data is 
>copied at the very beginning. So your memory consumption should be 
>exactly the same after the first request and after 1000's request 
>(assuming that you don't allocate any memory at run time). Here you 
>get more memory consumed at the beginning of the spawned thread, but 
>it stays the same.

Well, I see it this way: With threads, you're going to get the hit 
for everything possible at the beginning.  With fork, you get hit 
whenever anything _actually_ changes.  And spread out over time.  I 
would take fork() anytime over that.

>So let's say you have 8MB Opcode Tree and 4MB of mutable data. The 
>process totalling at 12MB. Using fork you will start off with all 
>12MB shared and get memory unshared as you go. With threads, you 
>will start off with 4MB upfront memory consumption and it'll stay 
>the same.

But if you start 100 threads, you'll 400 MByte, whereas fork 100 
times, you'll start off witb basically 12 MByte and a bit.  Its the 
_memory_ usage that is causing the problem.

On top of that, I think you will find quite the opposite in the 
amount of OpTree and mutable data usage.  A typical case would easier 
be something like 4MB of optree and 8MB of mutable data.

To prove my point, I have taken my Benchmark::Thread::Size module 
(available from CPAN) and tested the behaviour of POSIX with and 
without anything exported.

View raw message