lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <>
Subject Re: ParallelMultiSearcher reimplementation
Date Tue, 14 Nov 2006 02:26:05 GMT

Doug Cutting wrote on 11/13/2006 10:50 AM:
> Chuck Williams wrote:
>> I followed this same logic in ParallelWriter and got burned.  My first
>> implementation (still the version submitted as a patch in jira) used
>> dynamic threads to add the subdocuments to the parallel subindexes
>> simultaneously.  This hit a problem with abnormal native heap OOM's in
>> the jvm.  At first I thought it was simply a thread stack size / java
>> heap size configuration issue, but adjusting these did not resolve the
>> issue.  This was on linux.  ps -L showed large numbers of defunct
>> threads.  jconsole showed enormous growing total-ever-allocated thread
>> counts.  I switched to a thread pool and the issue went away with the
>> same config settings.
> Can you demonstrate the problem with a standalone program?
> Way back in the 90's I implemented a system at Excite that spawned one
> or more Java threads per request, and it ran for days on end, handling
> 20 or more requests per second.  The thread spawning overhead was
> insignificant.  That was JDK 1.2 on Solaris.  Have things gotten that
> much worse in the interim?  Today Hadoop's RPC allocates a thread per
> connection, and we see good performance.  So I certainly have
> counterexamples.

Are you pushing memory to the limit?  In my case, we need a maximally
sized Java heap (about 2.5G on linux) and so carefully minimize the
thread stack and perm space sizes.  My suspicion is that it takes a
while after a thread is defunct before all resources are reclaimed.  We
are hitting our server with 50 simultaneous threads doing indexing, each
of which writes 6 parallel subindexes in a separate thread.  This yields
hundreds of threads created per second in tight total thread stack
space; the process continually bumped over the native heap limit.  With
the change to thread pools, and therefore no dynamic creation and
destruction of thread stacks, all works fine.

Unless you are running with a maximal Java heap, you are unlikely to
have the issue as there is plenty of space left over for the native
heap, so a delay in thread stack reclamation would yield a larger
average process size, but would not cause OOM's.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message