lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamal Najib <kamal.na...@mytum.de>
Subject I got the score "0.3044460713863373" for the cosine similarity of two document with the same text content !!
Date Tue, 05 May 2009 21:23:37 GMT
hi all,
i got the similarity score 0.3044460713863373 between two docs which have the same text content,
is it correct? I expected 1.0, hier is my result line:

doc:"this expression of galectin-1 in blood vessel walls was correlated with vascular"	
doc2 :"this expression of galectin-1 in blood vessel walls was correlated with vascular"	Score
:"0.3044460713863373" 
is the score correct?
my methode is :
public double getSimilarity(String v1,String  v2) throws Exception
{
	
	float result=0;
	directory = new RAMDirectory();
	Analyzer analyzer = new StandardAnalyzer();
	IndexWriter writer = new IndexWriter(directory, analyzer,
      	true, IndexWriter.MaxFieldLength.LIMITED);
	
	
	Document doc1 = new Document();
	doc1.add(new Field("term",v1, Field.Store.YES, Field.Index.TOKENIZED));
	writer.addDocument(doc1);
	writer.close();
	IndexReader ir=IndexReader.open(directory);
	IndexSearcher searcher = new IndexSearcher(directory);
	Query query=SimilarityQueries.formSimilarQuery(v2,analyzer,"term",null);
	ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs;
	int docNum = scoreDocs[0].doc;
        result = scoreDocs[0].score;
        Document hitDoc = searcher.doc(docNum);
	System.out.println("Term 1 :"+v2+"  Term2:"+hitDoc.get("term")+"  Score :"+result);
        return result;
}
please help.
thanks in advance.
Kamal
-- 


Mime
View raw message