Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 58878 invoked from network); 20 Aug 2007 03:20:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Aug 2007 03:20:15 -0000 Received: (qmail 37923 invoked by uid 500); 20 Aug 2007 03:20:06 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 37892 invoked by uid 500); 20 Aug 2007 03:20:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 37881 invoked by uid 99); 20 Aug 2007 03:20:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Aug 2007 20:20:06 -0700 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Aug 2007 03:20:01 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1IMxno-0005qT-RW for java-user@lucene.apache.org; Sun, 19 Aug 2007 20:19:40 -0700 Message-ID: <12229492.post@talk.nabble.com> Date: Sun, 19 Aug 2007 20:19:40 -0700 (PDT) From: Lokeya To: java-user@lucene.apache.org Subject: Re: Document Similarities lucene(particularly using doc id's) In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: lokeya@gmail.com References: <12186723.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org Hi, Thanks for your reply. I can use the getTermFreqVector() on Index Reader and get it. But I am wondering whats the API which has to be used to find the similarity between 2 such vectors which would give a score (doc-doc similairty in essence). Thanks. Grant Ingersoll-6 wrote: > > Hi, > > > On Aug 16, 2007, at 2:20 PM, Lokeya wrote: > >> >> Hi All, >> >> I have the following set up: a) Indexed set of docs. b) Ran 1st >> query and >> got tops docs c) Fetched the id's from that and stored in a data >> structure. >> d) Ran 2nd query , got top docs , fetched id's and stored in a data >> structure. >> >> Now i have 2 sets of doc ids (set 1) and (set 1). >> >> I want to find out the document content similarity between these 2 >> sets(just >> using doc ids information which i have). >> > > Not sure what you mean here. What do the doc ids have to do with the > content? > >> Qn 1: Is it possible using any lucene api's. In that case can you >> point me >> to the appropriate API's. I did a search at >> :http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ >> javadoc/index.html >> But couldn't find anything. >> > > It is possible if you use Term Vectors (see > IndexReader.getTermFreqVector). You will need to store (when you > construct your Field) and load the term vectors and then calculate > the similarity. A common way of doing this is by calculating the > cosine of the angle between the two vectors. > > -Grant > > -------------------------- > Grant Ingersoll > http://lucene.grantingersoll.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Document-Similarities-lucene%28particularly-using-doc-id%27s%29-tf4281286.html#a12229492 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org