Hello,
I am using Lucene for plagiarism detection.
The goal is that: when I have a new document, I will check on the solr index
if there is a document that contain some common chunk.
So to compute similarity between the query and a source document I would use
this formula :
Score (suspicious document, source document) = Number of common chunk
between source document and suspicious document / Number of total chunk in
the suspicious document.
So I have to change the scoring formula in the Similarity class.
How can I change the scoring formula? ( by customizing only the Similarity
class? or Scorer?)
Do you have an Example of this use case?
Thank for your help.
|