lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tanguy Moal <tanguy.m...@gmail.com>
Subject Re: Disseminate results from different sources
Date Wed, 21 Mar 2012 17:32:32 GMT
Hello Franck,

I've had the same issue in the past.

I addressed that by adding a random value to each document.
I use this value in the "bf" parameter, so that the random value alters 
more or less the documents' score.

This results in a natural shuffling of documents which had the same 
score before.

I think you can also use a random field (random sort field type) (see 
http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html)
Using random sort field gives a unique random value to each doc per 
requested field name (i.e. random_1234() gives a different random values 
distribution than random_4321(), which can be helpful to give documents 
a different random value without reindexing everything, additionally you 
can change the random_call() every day to make sure you change the 
results order from time to time, but not at each query :-))

The only reason why I chose not to use random sort fields is very 
personal : I needed to box the random values (using 
scale(random_whatever(),0,1) so that the random tie breaker doesn't take 
precedence on natural scoring of documents, and that scale function 
needs to compute min and max random values for the selected documents, 
which seemed to be costly for large sets. (*10 on query time for a 
docset of about 100k doc) -- but I might be wrong here.

I hope this helps,

--
Tanguy

Le 21/03/2012 13:51, fbrisbart a écrit :
> Hi all,
>
> I have, in my dataset, documents from different sources (forum, news,
> reviews, ...)
> And I'd like to have a mix of them in my search results.
>
>
> The problem is that, depending only on the relevance, the results are
> often grouped by source (Ex.:50 'forum' docs before the first 'review'
> doc)
> So, I am looking for a way to slightly disseminate the results and avoid
> this behaviour.
>
> I could run 1 search per source and manually do the mix. But, I have ~10
> different sources, and I'm afraid this will be too slow.
>
> Is there a clean&  fast way to do that ? I eventually think about
> implementing a custom Scorer.
>
>
>
> Thanks,
> Franck
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message