lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justus Pendleton <>
Subject Re: Performance of never optimizing
Date Mon, 03 Nov 2008 05:49:32 GMT
On 03/11/2008, at 4:27 PM, Otis Gospodnetic wrote:
> Why are you optimizing?  Trying to make the search faster?  I would  
> try to avoid optimizing during high usage periods.

I assume that the original, long-ago, decision to optimize was made to  
improve searching performance.

> One thing that you might not have tried is the constant re-opening  
> of the IndexReader, which you'll need to do if you want to see index  
> changes instantly.

We do keep track of when the index has been updated and re-open  
IndexReaders so that they see the updates instantly.

> So you indexed once and then measured search performance?  Or did  
> you measure indexing performance?  I can't quite tell from your email.
> And in one case you optimized before searching and in the other you  
> did not optimize?

Yes, I indexed once and then measured search performance. (The actual  
algorithm used can be seen at

  For my current purposes I don't care about indexing performance.

>> 1. Why does the merge factor of 4 appear to be faster than the  
>> merge factor of
>> 2?
> Faster for indexing or searching?  If indexing, then it's because 4  
> means fewer segment merges than 2.  If searching, then I don't know,  
> unless you had indexing and searching happening in parallel, which  
> then means less IO for 4.

For searching. The index and search should not have been happening in  
parallel. However, multiple searches are occurring in parallel.

> Did you index fit in RAM, by the way?

The machine has, I believe, 4 GB of RAM and the benchmark suite  
reports than 700 MB were used, so it does appear to have fit into RAM.

>> 2. Why does non-optimized searching appear to be faster than  
>> optimized searching
>> once the index hits ~500,000 documents?
> Not sure without seeing the index/machine.

The machine is an 8-core Mac Pro. If you'd like, I can provide the  
indexes online somewhere. Or if you can provide pointers on what to  
look for, I'm more than happy to investigate this myself.

> It sounds like you were measuring search performance while at the  
> same time increasing the index size by incrementally adding more docs?

No documents were being added to the index while the searching was  
being performed. I was trying to measure only the search performance.

> 20 reqs/sec sounds very low.  How large is your index, how much RAM,  
> and how about heap size?
> What were your queries like? random?  from log?

The queries were generated by the ReutersQueryMaker. I am not sure  
what the heap size used as various stages were. (I ran the benchmarks  
over the weekend; they took several days.)

> I'm confused by what exactly you did and measured, but it could just  
> be that I'm tired.

My apologies for not being clearer in my initial email. I appreciate  
the help,


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message