Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 40384 invoked from network); 7 May 2009 20:21:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 May 2009 20:21:37 -0000 Received: (qmail 36012 invoked by uid 500); 7 May 2009 20:21:35 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 35962 invoked by uid 500); 7 May 2009 20:21:35 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 35946 invoked by uid 99); 7 May 2009 20:21:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 May 2009 20:21:35 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.97.132.119] (HELO spunkymail-a17.g.dreamhost.com) (208.97.132.119) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 May 2009 20:21:23 +0000 Received: from britta.yellowpages.local (unknown [12.186.229.36]) by spunkymail-a17.g.dreamhost.com (Postfix) with ESMTP id 1EB0B7370A for ; Thu, 7 May 2009 13:21:10 -0700 (PDT) Message-Id: <120527E6-9FFA-4518-9EBB-0BC07B930FB8@apache.org> From: Grant Ingersoll To: java-user@lucene.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: I got the score "0.3044460713863373" for the cosine similarity of two document with the same text content !! Date: Thu, 7 May 2009 13:20:55 -0700 References: X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org What does the searcher.explain() method say? -Grant On May 6, 2009, at 2:18 AM, Kamal Najib wrote: > hi, > thanks for the reply.see: http://lucene.apache.org/java/2_4_1/api/index.html > you will find there the Similarity have created and run to get the > similarity between the two Strings.I did the folow: > I created a doc: > doc.add(new Field("term","this expression of galectin-1 in blood > vessel walls was correlated with vascular", > Field.Store.YES,Field.Index.TOKENIZED)); > then I indexed it and i ran the followed Similarity query to get the > cosine similarity : > query=SimilarityQueries.formSimilarQuery("this expression of > galectin-1 in blood vessel walls was correlated with > vascular",analyzer,"term",null); > ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs; > I got the score mentioned above.(0.3044460713863373) > thanks. > kamal > Original Message: > > What is SimilarityQueries? I'd try the explain capabilities to see >
more. >
>
>
On May 5, 2009, at 2:23 PM, Kamal Najib wrote: >
>
> hi all, >
> i got the similarity score 0.3044460713863373 between two > docs which >
> have the same text content, is it correct? I expected 1.0, > hier is >
> my result line: >
> >
> doc:"this expression of galectin-1 in blood vessel walls was >
> correlated with vascular" >
> doc2 :"this expression of galectin-1 in blood vessel walls was >
> correlated with vascular" Score :"0.3044460713863373" >
> is the score correct? >
> my methode is : >
> public double getSimilarity(String v1,String v2) throws > Exception >
> { >
> >
> float result=0; >
> directory = new RAMDirectory(); >
> Analyzer analyzer = new StandardAnalyzer(); >
> IndexWriter writer = new IndexWriter(directory, analyzer, >
> true, IndexWriter.MaxFieldLength.LIMITED); >
> >
> >
> Document doc1 = new Document(); >
> doc1.add(new Field("term",v1, Field.Store.YES, >
> Field.Index.TOKENIZED)); >
> writer.addDocument(doc1); >
> writer.close(); >
> IndexReader ir=IndexReader.open(directory); >
> IndexSearcher searcher = new IndexSearcher(directory); >
> Query >
> > query=SimilarityQueries.formSimilarQuery(v2,analyzer,"term",null); >
> ScoreDoc[] scoreDocs = searcher.search(query,5).scoreDocs; >
> int docNum = scoreDocs[0].doc; >
> result = scoreDocs[0].score; >
> Document hitDoc = searcher.doc(docNum); >
> System.out.println("Term 1 :"+v2+" > Term2:"+hitDoc.get("term")+" >
> Score :"+result); >
> return result; >
> } >
> please help. >
> thanks in advance. >
> Kamal >
> -- >
> >
> >
> > --------------------------------------------------------------------- >
> To unsubscribe, e-mail: java-user- > unsubscribe@lucene.apache.org >
> For additional commands, e-mail: java-user-help@lucene.apache.org >
>
-------------------------- >
Grant Ingersoll >
http://www.lucidimagination.com/ >
>
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/ > Droids) >
using Solr/Lucene: >
http://www.lucidimagination.com/search >
>
>
>--------------------------------------------------------------------- >
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >
For additional commands, e-mail: java-user- > help@lucene.apache.org >
>
> > -- > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org