lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Whither Query Norm?
Date Wed, 25 Nov 2009 05:55:33 GMT
On Tue, Nov 24, 2009 at 9:31 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hello,
>
> Regarding that monstrous term->idf map.
> Is this something that one could use to adjust the scores in
> http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitationsscenario?
 Say a map like that was created periodically for each shard and
> distributed to all other nodes (so in the end each node has all maps
> locally).  Couldn't the local scorer in the Solr instance (and in
> distributed Lucene setup) consult idfs for relevant terms in all those maps
> and adjust the scores of local scores before returning results?
>
>
Why would you want all nodes to have all maps?  Why not merge them into one
map, then redistributed out to all nodes, which would be far smaller than
many maps anyways?  Then yes, the scoring can be done locally using this big
idfMap to produce scores, instead of using reader.docFreq() for idf, that's
what I do.  But then what are you implying should be done?  Just rescale the
top scores based on the idfs before returning your top results?  You'd need
to know exactly which terms hit those top-scoring documents, right? Which
implies the cost of basically explain(), doesn't it?

Although with the per-field scoring (the thing I do to be able to train on
sub-query field matches scores), this gets easier, because then you can try
to hang onto this information if the query isn't too big, but this isn't
something normal BooleanQueries will handle for you naturally.

  -jake

Mime
View raw message