lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Somnath Banerjee" <somnath.baner...@gmail.com>
Subject Re: Long Query Performance
Date Mon, 22 Jan 2007 13:28:20 GMT
Thanks for the reply. Good guess I think.

DB (Index) is basically a collection of encyclopedia documents. Queries are
also a collection of documents but of various domains. My task is to find
out for each "query document" top 100 matching encyclopedia contents.

I tried by using only the title of (5-8 words) the query documents instead
of full text of the document. But that is also taking 0.5-1 sec for each
query. That's mean it will also take nearly 6 and half days to run
0.72Mqueries (and expectedly the precision will suffer).

Thanks,
Somnath

On 1/22/07, Michael D. Curtin <mike@curtin.com> wrote:
>
> Somnath Banerjee wrote:
>
> >            I have created a 8GB index of almost 2 million documents. My
> > requirement is to run nearly 0.72 million query on this index. Each
> query
> > consists of 200 - 400 words. I have created a Boolean Query by ORing
> these
> > words. But each query is taking nearly 5 - 10 seconds to execute ( 2.78
> > GHz,
> > 1.5 GB RAM). That's mean the entire batch of 0.72M query will take more
> > than
> > 70 days to execute. Is it expected or there is a way to improve the
> > performance? From earlier posts I gathered that complex query is
> > expected to
> > take more time (this much???).
>
> A back of the envelope calculation:
>      8GB / 2M docs = 4KB per doc, on avg
>                    / 5 B per word, on avg = 800 words per doc, on avg
>
> So, each query is a quarter to half the size of the average document.  I
> suspect that just about every query is hitting almost every document in
> the db, i.e. the queries are not very selective at all.  That's going to
> be slow, no two ways about it.
>
> Could you tell us a bit more about the db and what your application is
> looking for in it, at a higher level of abstraction?
>
> --MDC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message