lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum Gupta <ansh...@apple.com>
Subject Re: Need help detecting Relatedness in documents
Date Thu, 26 Oct 2017 18:06:30 GMT
I would suggest you look at the mlt query parser. That allows you to find documents similar
to a particular documents, and also allows for specifying the field to use for similarity
purposes.

https://lucene.apache.org/solr/guide/7_0/other-parsers.html#more-like-this-query-parser <https://lucene.apache.org/solr/guide/7_0/other-parsers.html#more-like-this-query-parser>

-Anshum



> On Oct 26, 2017, at 1:16 AM, Atita Arora <atitaarora@gmail.com> wrote:
> 
> Hi ,
> 
> We're working with a productr where the idea is to present the users the
> related documents in particular timeseries.
> 
> For an overview think about this as an application which picks up top
> trending blogposts "topics" which are picked and ingested from various
> social sites.
> Further , when you look into the topic from the trending list it shows the
> related topics which happen to happen on the blogposts.
> So to mark a related topic they should have occured on a same blogpost , to
> add , more are these number of occurences , more would be the relatedness
> factor.
> 
> Complexity is the related topics change on the user defined date spread ,
> which means if x & y were top most related topics in the blogposts made in
> last 30 days ,
> there is an equal possibility that x could be more related to z if the user
> would have wanted to see related topics in last 60 days.
> So the number of days are user defined and they impact the related topics.
> 
> For now every blogpost goes in the index as a seperate document and the
> topic extraction happens alongside indexing which extracts the topics from
> the blogposts and stores them in a different collection.
> For this we have lot of duplicates on the index too , for e.g. a topicname
> search  "football" has around 80K documents , all of them are
> topicname="football".
> 
> I wonder if someone can help me :
> 1. How to structure the document in such a way the queries could be more
> performant
> 2. Suggest me as to how can we detect the RELATED topics.
> 
> Any help on this would be highly appreciated.
> 
> Thanks in advance.
> 
> Atita


Mime
View raw message