lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: best practice: 1.4 billions documents
Date Mon, 22 Nov 2010 17:49:07 GMT
Hi Yonik,

Can we do the same for Lucene, the problem is combining the rewritten
queries using the broken method in Query?

As far as I know, the problem is that e.g. MTQs rewrite *per searcher* so
each searcher uses a different rewritten query (with different terms). So
the scores are totally different even with a tf-idf patch (Fuzzy scores on
MultiSearcher and Solr are totally wrong because each shard uses another
rewritten query). To work around that, the Query class has a broken, broken,
broken, broken, broken method to combine queries, which violates DeMorgans
laws when there are e.g. negative clauses. And this method cannot be fixed
to work with all queries
(https://issues.apache.org/jira/browse/LUCENE-2756).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Monday, November 22, 2010 6:29 PM
> To: java-user@lucene.apache.org
> Subject: Re: best practice: 1.4 billions documents
> 
> On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> > The latest discussion was more about MultiReader vs. MultiSearcher.
> >
> > But you are right, 1.4 B documents is not easy to go, especially when
> > you index grows and you get to the 2.1 B marker, then no MultiSearcher
> > or whatever helps.
> >
> > On the other hand even distributed Solr has the same problems like
> > MultiSearcher: scoring MultiTermQueries (Fuzzy) doesn't work correctly
> 
> Are you referring to the idf being local to the shard instead of global to
the
> whole colleciton?
> Andrzej has a patch in the works, but it's not committed yet.
> 
> > negative MTQ clauses may produce wrong results if the query rewriting
> > is done like in MultiSearcher (which is unsolveable broken and broken
> > and broken and again broken for some queries as Boolean clauses - see
> > DeMorgan laws).
> 
> I don't think this is a problem for Solr.  Queries are executed on each
shard as
> normal (no difference from a non-distributed query).
> 
> -Yonik
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message