lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Baldwin, David" <David_Bald...@bmc.com>
Subject RE: How to properly correlate relevance in a search across multiple collections
Date Mon, 08 Sep 2014 23:14:07 GMT
I am looking at the MultiSearcher, which seems to have been around for a while (at least since
3.0.3) and I am wondering if that will do what I want.  I just looked at Lucene again and
it states that it searches multiple indexes with merged results.  I also see a lot of similar
comments about scores not being comparable from one index to another.  I am confused.  Does
anyone have any additional thoughts on MultiSearcher?  Reading Lucene in Action, it looks
like it does what I want it to do

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, September 08, 2014 10:31 AM
To: java-user
Subject: Re: How to properly correlate relevance in a search across multiple collections

I think the point got lost in the discussion. Raw scores are simply _not_ comparable from
different collections. They aren't even comparable for different queries in the _same_ collection.
They are _only_ relevant for ranking in the same collection within a single query.

And even then raw scores don't tell you much. A score of 2 isn't "twice as good" as a score
of 1, it's just "somewhat better".

So the bottom line is that you start resorting to some kind of clever presentation of the
different groups to the user; tabs for each collection, round-robin inclusion or meta-analysis
where you query the _same_ docs that exist in different indexes and try to create some satisfactory
heuristic etc.  as atawfik suggested.

Best,
Erick

On Mon, Sep 8, 2014 at 8:59 AM, Baldwin, David <David_Baldwin@bmc.com> wrote:
> Would it be possible, or does anyone have any experience, in using the raw score from
each separate collection to order and then after a merge come up with relevancy?
>
> -----Original Message-----
> From: atawfik [mailto:contact.txlabs@gmail.com]
> Sent: Sunday, September 07, 2014 9:50 AM
> To: java-user@lucene.apache.org
> Subject: Re: How to properly correlate relevance in a search across 
> multiple collections
>
> Hi,
>
> if you have documents that might exist in multiple collections, then 
> you can use techniques from meta search. That is combining multiple 
> search results from different collections. In this case, you can 
> retrieve the top 100 or
> 1000 documents from each collection and merge them. You then rank documents by using
some aggregation methods. It is known that using the sum of relevance scores produces good
results.
>
> If there are no shared documents between collections, you still can use the same approach
but using different aggregation methods. One method is round robin. You start by selecting
the first ranked document from each collection. Then, taking the second ranked document and
so on.
>
> If that does not fit your needs, probably you should search for "federated or aggregated
search techniques". These techniques are used by giant search engines to combine results from
their search engine parts (images,video and web). You can find a lot of academic resources
in these aspects.
>
> Regards
> Ameer
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-properly-correlate-relevance
> -in-a-search-across-multiple-collections-tp4157240p4157321.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message