lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: solr and diversification
Date Thu, 27 Sep 2018 23:29:01 GMT
Yeah, I think your plan sounds fine.

Do you have a specific use case for diversity of results. I've been
wondering if diversity of results would provide better perceived relevance.

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Sep 27, 2018 at 1:39 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
dceccarelli4@bloomberg.net> wrote:

> Yeah, I think Kmeans might be a way to implement the "top 3 stories that
> are more distant", but you can also have a more naïve (and faster) strategy
> like
>  - sending a threshold
>  - scan the documents according to the relevance score
>  - select the top documents that have diversity > threshold.
>
> I would allow to define the strategy and select it from the request.
>
> From: solr-user@lucene.apache.org At: 09/27/18 18:25:43To:  Diego
> Ceccarelli (BLOOMBERG/ LONDON ) ,  solr-user@lucene.apache.org
> Subject: Re: solr and diversification
>
> I've thought about this problem a little bit. What I was considering was
> using Kmeans clustering to cluster the top 50 docs, then pulling the top
> scoring doc form each cluster as the top documents. This should be fast and
> effective at getting diversity.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Sep 27, 2018 at 1:20 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> dceccarelli4@bloomberg.net> wrote:
>
> > Hi,
> >
> > I'm considering to write a component for diversifying the results. I know
> > that diversification can be achieved by using grouping but I'm thinking
> > about something different and query biased.
> > The idea is to have something that gets applied after the normal
> retrieval
> > and selects the top k documents more diverse based on some distance
> metric:
> >
> > Example:
> > imagine that you are asking for 10 rows, and you set diversify.rows=3
> > diversity.metric=tfidf  diversify.field=body
> >
> > Solr might retrieve the the top 10 rows as usual, extract tfidf vectors
> > for the bodies and select the top 3 stories that are more distant
> according
> > to the cosine similarity.
> > This would be different from grouping because documents will be
> > 'collapsed' or not based on the subset of documents retrieved for the
> > query.
> > Do you think it would make sense to have it as a component?  any feedback
> > / idea?
> >
> >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message