lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: best practice: 1.4 billions documents
Date Thu, 25 Nov 2010 09:24:45 GMT
You are in trouble if you use MultiTermQuery subclasses as negative clause in a BooleanQuery,
e.g a range like "-[A TO B]" or even NumericRanges or Wildcards. The query will then incorrect

Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: Ganesh []
> Sent: Thursday, November 25, 2010 9:55 AM
> To:
> Subject: Re: best practice: 1.4 billions documents
> Thanks for the input.
> My results are sorted by date and i am not much bothered about score. Will i
> still be in trouble?
> Regards
> Ganesh
> ----- Original Message -----
> From: "Robert Muir" <>
> To: <>
> Sent: Thursday, November 25, 2010 1:45 PM
> Subject: Re: best practice: 1.4 billions documents
> On Thu, Nov 25, 2010 at 2:58 AM, Uwe Schindler <> wrote:
> > ParallelMultiSearcher as subclass of MultiSearcher has the same problems.
> These are not crashes, but more that some queries do not return correct scored
> results for some queries. This effects especially all MultiTermQueries
> (TermRange, Fuzzy, NumericRange, Wildcard, Prefix) if they are used in a
> negative fashion (using MUST_NOT resp. "-" in QueryParser). For all of those
> queries except Fuzzy, you are safe if you use
> CONSTANT_SCORE_REWRITE_METHOD (using setRewriteMethod). The same
> problems apply for span queries. For *all* Fuzzy Queries (negative or not), the
> scores are simply wrong and so scoring is broken with (Parallel)MultiSearcher;
> wrong results are only returned when negative clauses!
> >
> you can use constant score rewrite method with fuzzy, too. then it
> will work "correctly" (even negative) with multisearcher too. but it
> will be slow, with unbounded number of results, and the fuzziness will
> not affect the scoring. (this is what constant score rewrite implies)
> the reason i say "correctly" is that for all of these queries,
> constant score rewrite is just a general workaround, and might still
> be incorrect. This is because many queries often have special cases
> where they rewrite to simpler things and in general the MultiSearcher
> combine() logic is broken here, so there might be more problems.
> > A new class ParallelIndexSearcher could help with that, when it parallelizes
> multiple segments, this is still in planning phase. The difference to
> ParallelMultiSearcher would be that it takes a "single" IndexReader (e.g. a
> MultiReader in your case) and parallelizes per segment/segment bunches.
> >
> Besides the inherited broken-ness from multisearcher,
> parallelmultisearcher is broken further because it requires you to
> organize your index structure in a special way to get concurrency.
> This is all pretty silly though, since ParallelMultiSearcher on a
> single machine isn't going to increase QPS, so how useful really is it
> in general???
> we should deprecate both the broken Multi & ParallelMulti Searchers
> and never look back.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> Send free SMS to your Friends on Mobile from your Yahoo! Messenger.
> Download Now!
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message