lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: best practice: 1.4 billions documents
Date Mon, 22 Nov 2010 17:49:07 GMT
Hi Yonik,

Can we do the same for Lucene, the problem is combining the rewritten
queries using the broken method in Query?

As far as I know, the problem is that e.g. MTQs rewrite *per searcher* so
each searcher uses a different rewritten query (with different terms). So
the scores are totally different even with a tf-idf patch (Fuzzy scores on
MultiSearcher and Solr are totally wrong because each shard uses another
rewritten query). To work around that, the Query class has a broken, broken,
broken, broken, broken method to combine queries, which violates DeMorgans
laws when there are e.g. negative clauses. And this method cannot be fixed
to work with all queries

Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: [] On Behalf Of Yonik
> Seeley
> Sent: Monday, November 22, 2010 6:29 PM
> To:
> Subject: Re: best practice: 1.4 billions documents
> On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler <> wrote:
> > The latest discussion was more about MultiReader vs. MultiSearcher.
> >
> > But you are right, 1.4 B documents is not easy to go, especially when
> > you index grows and you get to the 2.1 B marker, then no MultiSearcher
> > or whatever helps.
> >
> > On the other hand even distributed Solr has the same problems like
> > MultiSearcher: scoring MultiTermQueries (Fuzzy) doesn't work correctly
> Are you referring to the idf being local to the shard instead of global to
> whole colleciton?
> Andrzej has a patch in the works, but it's not committed yet.
> > negative MTQ clauses may produce wrong results if the query rewriting
> > is done like in MultiSearcher (which is unsolveable broken and broken
> > and broken and again broken for some queries as Boolean clauses - see
> > DeMorgan laws).
> I don't think this is a problem for Solr.  Queries are executed on each
shard as
> normal (no difference from a non-distributed query).
> -Yonik
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message