lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: TermInfosReader lazy term index reading
Date Fri, 02 Feb 2007 21:49:44 GMT
FYI,

For a process that is mostly CPU bound (which is the case with Lucene  
if the index is in the OS cache), having so many "active" threads  
will actually hurt performance due to the context switching and  
synchronization. Better to use a request queue / thread pool. (I  
think I read somewhere that a good rule of thumb is 2x the number of  
processors).

If most of the searches are IO bound having so many disparate  
requests will hurt performance as well since the disk heads will be  
seeking all over the place and losing any locality of data that  
Lucene provides (postings, sequental term reads, etc.).

There are some excellent academic papers I just came across on high- 
performance parallel disk based sorting and many of the techniques/ 
concerns apply to Lucene.

Robert


On Feb 2, 2007, at 3:38 PM, Yonik Seeley wrote:

> On 2/2/07, Doug Cutting <cutting@apache.org> wrote:
>> Yonik Seeley wrote:
>> > I ran across a situation where a great number of threads were  
>> blocked on
>> > ensureIndexIsRead(), even after it had already been loaded.
>>
>> That sounds bizarre.  A sync block that tests a field for non-null
>> shouldn't tie things up much, I wouldn't think.
>
> There were hundreds of threads all blocked on the same lock.
> I think synchronization can become expensive under heavy contention,
> regardless of how lightweight the code inside.
>
> It's obviously not the root cause of the problem... the query
> structure was very expensive (a range query covering most documents
> that didn't get pulled out into a Filter), but it still could be an
> area of improvement.
>
> I'm going to try and see if I can duplicate it, then see what effect
> removing the synchronization has.
>
>>   Are you sure that one
>> of the threads wasn't actually reading the index?
>
> Yep.  We've seen the same thing with older versions of Lucene when
> multiple threads tried to sort on the same field and there was massive
> contention from everyone trying to generate the same entry.
>
>> Or perhaps some other
>> method also synchronizes on the same object?
>
> Good question... I only checked TermInfosReader itself.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message