lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganesh" <>
Subject Re: best practice: 1.4 billions documents
Date Thu, 25 Nov 2010 08:55:05 GMT
Thanks for the input.

My results are sorted by date and i am not much bothered about score. Will i still be in trouble?


----- Original Message ----- 
From: "Robert Muir" <>
To: <>
Sent: Thursday, November 25, 2010 1:45 PM
Subject: Re: best practice: 1.4 billions documents

On Thu, Nov 25, 2010 at 2:58 AM, Uwe Schindler <> wrote:
> ParallelMultiSearcher as subclass of MultiSearcher has the same problems. These are not
crashes, but more that some queries do not return correct scored results for some queries.
This effects especially all MultiTermQueries (TermRange, Fuzzy, NumericRange, Wildcard, Prefix)
if they are used in a negative fashion (using MUST_NOT resp. "-" in QueryParser). For all
of those queries except Fuzzy, you are safe if you use CONSTANT_SCORE_REWRITE_METHOD (using
setRewriteMethod). The same problems apply for span queries. For *all* Fuzzy Queries (negative
or not), the scores are simply wrong and so scoring is broken with (Parallel)MultiSearcher;
wrong results are only returned when negative clauses!

you can use constant score rewrite method with fuzzy, too. then it
will work "correctly" (even negative) with multisearcher too. but it
will be slow, with unbounded number of results, and the fuzziness will
not affect the scoring. (this is what constant score rewrite implies)

the reason i say "correctly" is that for all of these queries,
constant score rewrite is just a general workaround, and might still
be incorrect. This is because many queries often have special cases
where they rewrite to simpler things and in general the MultiSearcher
combine() logic is broken here, so there might be more problems.

> A new class ParallelIndexSearcher could help with that, when it parallelizes multiple
segments, this is still in planning phase. The difference to ParallelMultiSearcher would be
that it takes a "single" IndexReader (e.g. a MultiReader in your case) and parallelizes per
segment/segment bunches.

Besides the inherited broken-ness from multisearcher,
parallelmultisearcher is broken further because it requires you to
organize your index structure in a special way to get concurrency.

This is all pretty silly though, since ParallelMultiSearcher on a
single machine isn't going to increase QPS, so how useful really is it
in general???

we should deprecate both the broken Multi & ParallelMulti Searchers
and never look back.

To unsubscribe, e-mail:
For additional commands, e-mail:

Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message