lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Post processing to get around TooManyClauses?
Date Fri, 07 Dec 2007 14:43:06 GMT
Have you looked at Filters? Essentially, you construct a bitmap where each
bit corresponds to a document and pass that along into your search.
Constructing
a filter is surprisingly speedy.

What you lose is relevancy for the filtered part of the query. See
ConstantScoreQuery.

Also, search the mail archive for wildcard and you'll find a wealth of
information.
See the thread "I just don't get wildcards at all" for some very worthwhile
discussion.

Best
Erick

On Dec 7, 2007 6:33 AM, d33mb33 <david.balzan@entity.co.uk> wrote:

>
> I have developed a fuzzy search application over a database of books
> (titles,
> authors etc) and it works really well.  (I use Lucene.Net but read the
> JavaDocs and forums for java Lucene)
>
> However I've got an interesting use case with "TooManyClauses" and need
> some
> help in solving it.
>
> My users accept that silly title queries like "m*" are going to return too
> many results to be useful but they want to combine the wildcard searches
> with other search terms.
>
> For example:
>
> Use Case 1
> A user wants to search for books by "Charles Dickens"
> This works fine using a term query and about 500 results are returned
>
> Use Case 2
> A user wants to search for books by "Charles Dickens" where the title
> starts
> with M
> This throws a TooManyClauses exceptions (or eats a huge amount of RAM)
> because, I guess, Lucene treats the two as independent queries and M* is
> expanded across the whole index and not just the books by Charles Dickens.
> User's don't understand why the Use Case 1 works but Use Case 2 doesn't.
> Use Case 2 as actually being a more restrictive query and will return
> better
> results than Use Case 1.
>
> I've thought a bit about how to solve this but none of them seem very
> elegant or efficient.
> One solution could be too eliminate one or two character wildcards from
> the
> inital search and then loop through the results doing a String.contains or
> something horrible.
> Another solution could be through clever use of the QueryFilter classes
> but
> I don't quite understand how they work yet.
>
> Any suggestions would be welcome
>
> --
> View this message in context:
> http://www.nabble.com/Post-processing-to-get-around-TooManyClauses--tf4961564.html#a14210833
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message