lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Search Performance Problem 16 sec for 250K docs
Date Sat, 19 Aug 2006 15:22:38 GMT
This is a loooonnnnnggg time, I think you're right, it's excessive.

What are you timing? The time to complete the search (i.e. get a Hits object
back) or the total time to assemble the response? Why I ask is that the Hits
object is designed to return the fir st100 or so docs efficiently. Every 100
docs or so, it re-executes the query. So, if you're returning a large result
set, the using the Hits object to iterate over them, this could account for
your time. Use a HitCollector instead... But do note this from the javadoc
for hitcollector

----
. For good search performance, implementations of this method should not
call Searcher.doc(int)<file:///C:/lucene_1.9.1/docs/api/org/apache/lucene/search/Searcher.html#doc%28int%29>or
IndexReader.document(int)<file:///C:/lucene_1.9.1/docs/api/org/apache/lucene/index/IndexReader.html#document%28int%29>on
every document number encountered. Doing so can slow searches by an
order
of magnitude or more.
-----

FWIW, I have indexes larger that 1G that return in far less time than you
are indicating, through three layers and constructing web pages in the
meantime. It contains over 800K documents and the response time is around a
second (haven't timed it lately). This includes 5-way sorts.

You might also either get a copy of Luke and have it explain exactly what
the parse does or use one of the query exlain calls (sorry, don't remember
them off the top of my head) to see what query is *actually* submitted and
whether it's what you expect.

Are you using wildcards? They also have an effect on query speed.

If none of this applies, perhaps you could post the query and the how the
index is constructed. If you haven't already gotten a copy of Luke, I
heartily recommend it....

Hope this helps
Erick

On 8/19/06, M A <geneticflyer@googlemail.com> wrote:
>
> Hi there,
>
> I have an index with about 250K document, to be indexed full text.
>
> there are 2 types of searches carried out, 1. using 1 field, the other
> using
> 4 .. for a query string ...
>
> given the nature of the queries required, all stop words are maintained in
> the index, thereby allowing for phrasal queries, (this is a requirement)
> ..
>
> So search I am using the following ..
>
> if(fldArray.length > 1)
>     {
>       // use four fields
>       BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
> BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD,
> BooleanClause.Occur.SHOULD};
>       query = MultiFieldQueryParser.parse(queryString, fldArray, flags,
> analyzer); //parse the
>     }
>     else
>     {
>       //use only 1 field
>       query = new QueryParser("tags", analyzer).parse(queryString);
>     }
>
>
> When i search on the 4 fields the average search time is 16 sec ..
> When i search on the 1 field the average search time is 9 secs ...
>
> The Analyzer used for both searching and indexing is
> Analyzer analyzer = new StandardAnalyzer(new String[]{});
>
> The index size is about a 1GB ..
>
> The documents vary in size some are less than 1K max size is about 5k
>
>
>
> Is there anything I can do to make this faster.... 16 secs is just not
> acceptable ..
>
> Machine : 512MB, celeron 2600 ...  Lucene 2.0
>
> I could go for a bigger machine but wanted to make sure that the problem
> was
> not something I was doing, given 250K is not that large a figure ..
>
> Please Help
>
> Thanx
>
> Mo
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message