lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <>
Subject RE: Lucene Optimized Query Broken?
Date Wed, 07 Jan 2004 17:31:22 GMT
I thought of another case, probably more appropriate...

For example, the search is on 'printer', but the user only wants to see
documents in the last 7 days. There may be thousands of 'printer' documents,
but very few in last 7 days. Why search the entire index, when the date
range will probably restrict the documents (depending on how often document
are added to the index) to a much greater degree.

Maybe I am having this trouble because the documents I've indexed have
common terms that occur very frequently but really can't be thrown out. For
example, "ibm printer" is far different than "ibm computer".


-----Original Message-----
From: Doug Cutting []
Sent: Wednesday, January 07, 2004 10:56 AM
To: Lucene Developers List
Subject: Re: Lucene Optimized Query Broken?

Robert Engels wrote:
> I have a index with documents that have only 2 fields, the first (unique)
> 'very unique', in that most document have at least somewhat varying terms,
> the second is a boolean that contains only (boolean) 'true' or 'false'.
> index contains 100,000,000+ documents.
> If I perform the following search "+unique:somevalue +boolean:true',
> with search on the first term, returning very few documents, but then it
> will search the second term, returning possibly a million+ documents, then
> it will intersect the list, return 'hits' of only a few documents.

First, this is not the sort of query that Lucene is designed to
efficiently handle.  Rather, this is the sort of thing that a relational
database is desgined for.  Lucene is primarily designed to support text
searching, where field values are natural language text and query terms
are words describing a user's interest.  You can implement full text
search with a relational database, but it will be slow.  Similarly, you
can search tabular data with Lucene, but it may be slow.

That said, I'm currently working on an optimization that will make such
queries substantially faster in Lucene.  The heart of it is to add data
to the index so that TermDocs.skipTo() is much faster.  Then the search
algorithms are modified to call TermDocs.skipTo().  This should make
conjunctive queries (ANDs and phrases) significantly faster when one
term occurs much less frequently than others.  I hope to check this in
in the next week or so.


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message