lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <>
Subject [jira] Commented: (LUCENE-1195) Performance improvement for TermInfosReader
Date Wed, 27 Feb 2008 21:27:51 GMT


Yonik Seeley commented on LUCENE-1195:

There's higher level synchronization too (ensuring that two different threads don't generate
the same cache entry at the same time), and I agree that should not be done in this case.

Just use Collections.synchronizedMap(), it will be the same speed, more readable, and can
be easily replaced later anyway.

> Performance improvement for TermInfosReader
> -------------------------------------------
>                 Key: LUCENE-1195
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 2.4
>         Attachments: lucene-1195.patch
> Currently we have a bottleneck for multi-term queries: the dictionary lookup is being
> twice for each term. The first time in Similarity.idf(), where searcher.docFreq() is
> The second time when the posting list is opened (TermDocs or TermPositions).
> The dictionary lookup is not cheap, that's why a significant performance improvement
> possible here if we avoid the second lookup. An easy way to do this is to add a small
> cache to TermInfosReader. 
> I ran some performance experiments with an LRU cache size of 20, and an mid-size index
> 500,000 documents from wikipedia. Here are some test results:
> 50,000 AND queries with 3 terms each:
> old:                  152 secs
> new (with LRU cache): 112 secs (26% faster)
> 50,000 OR queries with 3 terms each:
> old:                  175 secs
> new (with LRU cache): 133 secs (24% faster)
> For bigger indexes this patch will probably have less impact, for smaller once more.
> I will attach a patch soon.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message