lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: FW: Lucene Search has poor cpu utilization on a 4-CPU machine
Date Wed, 14 Jul 2004 20:44:31 GMT
Aviran wrote:
> The next bottleneck is not very clear. There are two candidates which appear
> frequently in the thread dump.
> 
> The first one which appears more frequent then the others is using
> java.lang.StrictMath.log which is used in
> org.apache.lucene.search.DefaultSimilarity.idf. Definitely spending a lot of
> time there. (I don't know if there is anything we can do about it)

We could add an idf cache for small values of docFreq, e.g., 0-32, 
represented as a float[].  If you're doing a lot of range or wildcard 
queries then this should have very high hit rates.  The cache should be 
on the searcher, which determines the numDocs parameter.  It could be 
accessed through a new Searchable method, idf(Term).

> The second one is on
> org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:318) which is
> a synchronized method thus causing locks. I guess the synchronization is
> done for a good reason, but you probably know the answer better then me.

I'm surprised this is showing up.  Can you tell more about the size of 
your index and the nature of your queries?  If you're, e.g., doing lots 
of range or wildcard queries, then I can maybe see this showing up a 
little.  What is your benchmark like?

Are you "warming the cache" when you're performing these benchmarks?  In 
other words, are you first sending a few queries at a low rate before 
you start slamming it with high traffic?  If you're not, and/or you have 
a lot of fields, or you re-open searchers a lot, then this could show up 
too.

Doug



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message