lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Lucene Optimizations
Date Wed, 04 Apr 2007 05:33:24 GMT
Hi Nilesh,

I have a few queries regarding optimizing lucene for search performance.

1. We index around 50 million text documents with index size greater
than 40GB, and hence runtime performance is curcial. Our system has
only simple keyword queries. Each search returns an object of type
Hits which contains all matching results. Since usually only top-10
results are displayed to the user, it does not make sense to search
for all matching hundreds of thousand documents. Is there a way to
specify the maximum number of search results that I want to be
returned (and possibly have peroformance benefit)?

OG: Not with Hits.  You can use this, though:
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/IndexSearcher.html#search(org.apache.lucene.search.Weight,%20org.apache.lucene.search.Filter,%20int)


2. We many a timees use the index to only get a count on number of
matching documents. Only the size of answer set of the search query is
required, and not the answer set itself. Is there a way to specify
this while searching for some possible performance benefit.

OG:
Write your own HitCollector, say CountingHitCollector, and just count++ inside it.
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/IndexSearcher.html#search(org.apache.lucene.search.Weight,%20org.apache.lucene.search.Filter,%20org.apache.lucene.search.HitCollector)

If you are dealing with just Terms, you could use:
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/IndexSearcher.html#docFreq(org.apache.lucene.index.Term)
or
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Searcher.html#docFreqs(org.apache.lucene.index.Term[])

3. We have really simple conjunctive keyword queries. No phrases, no
fuzzy matching, and no advanced search predicates. Is there a way that
we can specify these restrictions on our querying language to have
some possible performance benefits.

OG: You can write your own QueryParser and disable types of queries that could be expensive
if run, but that is it.

Otis




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message