lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashi Kant <sk...@sloan.mit.edu>
Subject Re: Using Lucene/Solr for Plagiarism detection
Date Thu, 30 Dec 2010 16:22:13 GMT
Have you considered using document similarity metrics such as Cosine Similarity?


On Thu, Dec 30, 2010 at 6:05 AM, Amel Fraisse <amel.fraisse@gmail.com> wrote:
> Hello,
>
> I am using Lucene for plagiarism detection.
>
> The goal is that: when I have a new document, I will check on the solr index
> if there is a document that contain some common chunk.
>
> So to compute similarity between the query and a source document I would use
> this formula :
>
> Score (suspicious document, source document) = Number of common chunk
> between source document and suspicious document  / Number of total chunk in
> the suspicious document.
>
> So I have to change the scoring formula in the Similarity class.
>
> How can I change the scoring formula? ( by customizing only the Similarity
> class? or Scorer?)
>
> Do you have an Example of this use case?
>
> Thank for your help.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message