lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Somnath Banerjee" <somnath.baner...@gmail.com>
Subject Long Query Performance
Date Mon, 22 Jan 2007 11:34:42 GMT
Hi All,

            I have created a 8GB index of almost 2 million documents. My
requirement is to run nearly 0.72 million query on this index. Each query
consists of 200 - 400 words. I have created a Boolean Query by ORing these
words. But each query is taking nearly 5 - 10 seconds to execute ( 2.78 GHz,
1.5 GB RAM). That's mean the entire batch of 0.72M query will take more than
70 days to execute. Is it expected or there is a way to improve the
performance? From earlier posts I gathered that complex query is expected to
take more time (this much???).

    I have tried some of the improvements mentioned in other posts (e.g.
increasing JVM heap space) without much benefit. Please let me know if you
can think of any optimization technique given that my requirement is to
execute all those queries in a batch run (additional hardware is not an
option for me). Also, I just need top 150-200 results for each query. Can
that be used to speed up the process?

    In case I'm doing something wrong I have mentioned below the way I'm
constructing the query and few lines of logs

    IndexSearcher sh = new IndexSearcher("IndexPath");

    for each query {
         BooleanQuery bq = new BooleanQuery();
         For each word in the query text {
               bq.add(new TermQuery(new Term("text", tktext)),
BooleanClause.Occur.SHOULD);
         }
        sh.search(bq);
    }
    sh.close();


Performance Log (Query Length = No. Of Words; Time= Millisecond)
Query Length: 332 Time Taken: 8609
Query Length: 276 Time Taken: 5172
Query Length: 345 Time Taken: 9313

Thanks in advance,
Somnath

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message