lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amel Fraisse <>
Subject Using Lucene/Solr for Plagiarism detection
Date Thu, 30 Dec 2010 11:05:54 GMT

I am using Lucene for plagiarism detection.

The goal is that: when I have a new document, I will check on the solr index
if there is a document that contain some common chunk.

So to compute similarity between the query and a source document I would use
this formula :

Score (suspicious document, source document) = Number of common chunk
between source document and suspicious document  / Number of total chunk in
the suspicious document.

So I have to change the scoring formula in the Similarity class.

How can I change the scoring formula? ( by customizing only the Similarity
class? or Scorer?)

Do you have an Example of this use case?

Thank for your help.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message