lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael D. Curtin" <m...@curtin.com>
Subject Re: Long Query Performance
Date Mon, 22 Jan 2007 13:07:03 GMT
Somnath Banerjee wrote:

>            I have created a 8GB index of almost 2 million documents. My
> requirement is to run nearly 0.72 million query on this index. Each query
> consists of 200 - 400 words. I have created a Boolean Query by ORing these
> words. But each query is taking nearly 5 - 10 seconds to execute ( 2.78 
> GHz,
> 1.5 GB RAM). That's mean the entire batch of 0.72M query will take more 
> than
> 70 days to execute. Is it expected or there is a way to improve the
> performance? From earlier posts I gathered that complex query is 
> expected to
> take more time (this much???).

A back of the envelope calculation:
     8GB / 2M docs = 4KB per doc, on avg
                   / 5 B per word, on avg = 800 words per doc, on avg

So, each query is a quarter to half the size of the average document.  I 
suspect that just about every query is hitting almost every document in 
the db, i.e. the queries are not very selective at all.  That's going to 
be slow, no two ways about it.

Could you tell us a bit more about the db and what your application is 
looking for in it, at a higher level of abstraction?

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message