lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From testn <te...@doramail.com>
Subject Re: Performance Questions
Date Tue, 11 Sep 2007 02:13:20 GMT

- Searcher itself doesn't cost much. The cost came from the construction of
TermsInfoReader from IndexReader
- This means you can construct a number of searchers based on different
combination of indices.
- If I were you, I would construct a number of indices based on the demand
of freshness.
    - Reopen indices that stale is not an open very often... using
IndexReader.isCurrent() to help you
    - Reopen indices that stale is ok less often say every 10-20 minutes
- Then when you want to search, you just need to construct an IndexSearcher
using MultiReader/ParallelReader of those indices above
- Make sure you close the stale indices that you already opened an updated
indices
- If that's not enough and using HEAD code won't make you fainted, you can
try HEAD code that has LUCENE-743 implemented.
- For frequently updated data, it is probably better to use database
especially if you don't need scoring and keyword analyzing capability since
it's pretty costly to reopen IndexReader every time the data has been
updated.
- IndexSearcher doesn't support refreshing as it is based on IndexReader to
do the work. The caching of terms is done inside
IndexReader/TermsInfoReader. So if you want to update IndexSearcher, you
need to reopen it with more updated version of IndexReader.
- To get the best performance, you should really query just the data you
need. 

moshe wrote:
> 
> I have a couple questions regarding performance of lucene. First off my
> environment:
> 
> Data
> 1-10M Documents
> 5 - 30 fields < 10B
> 1-3 Fields 1KB - 500KB
> 
> I have three types of queries:
> 
> Query 1 : 85% usage 
> 1-2  phrase terms i.e. +id:"651" +id2:"241"
> sorting by an arbitrary field normally the date
> 5-20 security terms
> 5k-1M results
> can never return stale data
> 
> Query 2:  13%
> 10 full wildcard terms i.e. *search*
> sorting is optional
> 0-200 results
> 20-200 security terms
> can return slightly stale data
> 
> Query 3: 2%
> 1-20 mixed terms
> sorting is optional
> 0-200 results
> 20-200 security terms
> can return slightly stale data
> 
> 1) Does re-opening an IndexSearcher flush all of the caches (filter and
> sort) ? 
> 
> 2) What is the overhead of opening an IndexSearcher? What does it depend
> on?
> 
> 3) What is the recommended approach for updating and refreshing the index
> where there is 1 update for every 5 queries? 
> 
> 4) Is query 1 better off done using a database as I would have to re-open
> the IndexSeacher every couple of queries?
> 
> 5) What would perform better Solr or Lucence? When is it better to use one
> or the other?
> 
> 6) What else should I look out for?
> 
> 7) Why is refreshing an IndexSearcher not supported? 
> 
> 
> Any help is greatly appreciated 
> Thanks
> Moshe 
> 
> 
>  
> 
> 

-- 
View this message in context: http://www.nabble.com/Performance-Questions-tf4405513.html#a12606587
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message