lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lokeya <lok...@gmail.com>
Subject Re: Document Similarities lucene(particularly using doc id's)
Date Mon, 20 Aug 2007 03:19:40 GMT

Hi,

Thanks for your reply.

I can use the getTermFreqVector() on Index Reader and get it. But I am
wondering whats the API which has to be used to find the similarity between
2 such vectors which would give a score (doc-doc similairty in  essence). 

Thanks.



Grant Ingersoll-6 wrote:
> 
> Hi,
> 
> 
> On Aug 16, 2007, at 2:20 PM, Lokeya wrote:
> 
>>
>> Hi All,
>>
>> I have the following set up: a) Indexed set of docs. b) Ran 1st  
>> query and
>> got tops docs  c) Fetched the id's from that and stored in a data  
>> structure.
>> d) Ran 2nd query , got top docs , fetched id's and stored in a data
>> structure.
>>
>> Now i have 2 sets of doc ids (set 1) and (set 1).
>>
>> I want to find out the document content similarity between these 2  
>> sets(just
>> using doc ids information which i have).
>>
> 
> Not sure what you mean here.  What do the doc ids have to do with the  
> content?
> 
>> Qn 1: Is it possible using any lucene api's. In that case can you  
>> point me
>> to the appropriate API's. I did a search at
>> :http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ 
>> javadoc/index.html
>> But couldn't find anything.
>>
> 
> It is possible if you use Term Vectors (see  
> IndexReader.getTermFreqVector).  You will need to store (when you  
> construct your Field) and load the term vectors and then calculate  
> the similarity.  A common way of doing this is by calculating the  
> cosine of the angle between the two vectors.
> 
> -Grant
> 
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Document-Similarities-lucene%28particularly-using-doc-id%27s%29-tf4281286.html#a12229492
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message