lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xavier Schepler <xavier.schep...@sciences-po.fr>
Subject Phrase similarity - "more like this" feature for small set of terms
Date Mon, 15 Feb 2010 11:39:24 GMT
Hi,


there is an indexed field in my Solr's schema, in which one phrase is 
stored per document.
I have to implement a feature that will allow users to have "more like 
this" results, based on the contents of this field.
I think that the Solr's built in "more like this" feature requires too 
many terms to be effective, maybe it's not the case.
I would like to use a custom algorithm, probably based on the Jaccard 
Index <http://en.wikipedia.org/wiki/Jaccard_index>.

I see three options :

1 - create a Solr plug-in, which would introduce a custom "More like 
this" feature. That might be tricky.

2 - the quick and dirty way : sending queries that are crafted from the 
client side. Given the phrase : "term1 term2 term3 term4", it would be 
something like that:
(term1 AND term2 AND term3) OR (term1 AND term2 AND term4) OR (term1 AND 
term3 AND term4)  OR ...
With a good list of stop words, and well thought thresholds for the 
numbers of terms, the queries should not become too long.

3 - working with a stop word list and more like this parameters


I would have time to develop a solr's plugin, but I don't know how hard 
it would be.


Thanks in advance for your advices,


Xavier S.

Mime
View raw message