lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Lucene Optimized Query Broken?
Date Wed, 07 Jan 2004 16:56:27 GMT
Robert Engels wrote:
> I have a index with documents that have only 2 fields, the first (unique) is
> 'very unique', in that most document have at least somewhat varying terms,
> the second is a boolean that contains only (boolean) 'true' or 'false'. The
> index contains 100,000,000+ documents.
> 
> If I perform the following search "+unique:somevalue +boolean:true', lucene
> with search on the first term, returning very few documents, but then it
> will search the second term, returning possibly a million+ documents, then
> it will intersect the list, return 'hits' of only a few documents.

First, this is not the sort of query that Lucene is designed to 
efficiently handle.  Rather, this is the sort of thing that a relational 
database is desgined for.  Lucene is primarily designed to support text 
searching, where field values are natural language text and query terms 
are words describing a user's interest.  You can implement full text 
search with a relational database, but it will be slow.  Similarly, you 
can search tabular data with Lucene, but it may be slow.

That said, I'm currently working on an optimization that will make such 
queries substantially faster in Lucene.  The heart of it is to add data 
to the index so that TermDocs.skipTo() is much faster.  Then the search 
algorithms are modified to call TermDocs.skipTo().  This should make 
conjunctive queries (ANDs and phrases) significantly faster when one 
term occurs much less frequently than others.  I hope to check this in 
in the next week or so.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message