lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amel Fraisse <amel.frai...@gmail.com>
Subject Re: Using Lucene/Solr for Plagiarism detection
Date Thu, 30 Dec 2010 16:28:26 GMT
Hello,

No I'm not using cosine similarity metrics.


2010/12/30 Shashi Kant <skant@sloan.mit.edu>

> Have you considered using document similarity metrics such as Cosine
> Similarity?
>
>
> On Thu, Dec 30, 2010 at 6:05 AM, Amel Fraisse <amel.fraisse@gmail.com>
> wrote:
> > Hello,
> >
> > I am using Lucene for plagiarism detection.
> >
> > The goal is that: when I have a new document, I will check on the solr
> index
> > if there is a document that contain some common chunk.
> >
> > So to compute similarity between the query and a source document I would
> use
> > this formula :
> >
> > Score (suspicious document, source document) = Number of common chunk
> > between source document and suspicious document  / Number of total chunk
> in
> > the suspicious document.
> >
> > So I have to change the scoring formula in the Similarity class.
> >
> > How can I change the scoring formula? ( by customizing only the
> Similarity
> > class? or Scorer?)
> >
> > Do you have an Example of this use case?
> >
> > Thank for your help.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
--------------
Amel Fraisse

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message