lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Performance of never optimizing
Date Mon, 03 Nov 2008 12:07:24 GMT
Am I missing your benchmark algorithm somewhere? We need it. Something 
doesn't make sense.

- Mark


Justus Pendleton wrote:
> Howdy,
>
> I have a couple of questions regarding some Lucene benchmarking and 
> what the results mean[3]. (Skip to the numbered list at the end if you 
> don't want to read the lengthy exegesis :)
>
> I'm a developer for JIRA[1]. We are currently trying to get a better 
> understanding of Lucene, and our use of it, to cope with the needs of 
> our larger customers. These "large" indexes are only a couple hundred 
> thousand documents but our problem is compounded by the fact that they 
> have a relatively high rate of modification (=delete+insert of new 
> document) and our users expect these modification to show up in query 
> results pretty much instantly.
>
> Our current default behaviour is a merge factor of 4. We perform an 
> optimization on the index every 4000 additions. We also perform an 
> optimize at midnight. Our fundamental problem is that these 
> optimizations are locking the index for unacceptably long periods of 
> time, something that we want to resolve for our next major release, 
> hopefully without undermining search performance too badly.
>
> In the Lucene javadoc there is a comment, and a link to a mailing list 
> discussion[2], that suggests applications such as JIRA should never 
> perform optimize but should instead set their merge factor very low.
>
> In an attempt to understand the impact of a) lowering the merge factor 
> from 4 to 2 and b) never, ever optimizing on an index (over the course 
> of years and millions of additions/updates) I wanted to try to 
> benchmark Lucene.
>
> I used the contrib/benchmark framework and wrote a small algorithm 
> that adds documents to an index (using the Reuters doc generator), 
> does a search, does an optimize, then does another search. All the 
> pretty pictures can be seen at:
>
>   http://confluence.atlassian.com/display/JIRACOM/Lucene+graphs
>
> I have several questions, hopefully they aren't overwhelming in their 
> quantity :-/
>
> 1. Why does the merge factor of 4 appear to be faster than the merge 
> factor of 2?
>
> 2. Why does non-optimized searching appear to be faster than optimized 
> searching once the index hits ~500,000 documents?
>
> 3. There appears to be a fairly sizable performance drop across the 
> board around 450,000 documents. Why is that?
>
> 4. Searching performance appears to decrease towards a fairly 
> pessimistic 20 searches per second (for a relatively simple search). 
> Is this really what we should expect long-term from Lucene?
>
> 5. Does my benchmark even make sense? I am far from an expert on 
> benchmarking so it is possible I'm not measuring what I think I am 
> measuring.
>
> Thanks in advance for any insight you can provide. This is an area 
> that we very much want to understand better as Lucene is a key part of 
> JIRA's success,
>
> Cheers,
> Justus
> JIRA Developer
>
> [1]: http://www.atlassian.com
> [2]: http://www.gossamer-threads.com/lists/lucene/java-dev/47895
> [3]: http://confluence.atlassian.com/display/JIRACOM/Lucene+graphs
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message