lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Document Similarities lucene(particularly using doc id's)
Date Fri, 17 Aug 2007 20:22:30 GMT
Hi,


On Aug 16, 2007, at 2:20 PM, Lokeya wrote:

>
> Hi All,
>
> I have the following set up: a) Indexed set of docs. b) Ran 1st  
> query and
> got tops docs  c) Fetched the id's from that and stored in a data  
> structure.
> d) Ran 2nd query , got top docs , fetched id's and stored in a data
> structure.
>
> Now i have 2 sets of doc ids (set 1) and (set 1).
>
> I want to find out the document content similarity between these 2  
> sets(just
> using doc ids information which i have).
>

Not sure what you mean here.  What do the doc ids have to do with the  
content?

> Qn 1: Is it possible using any lucene api's. In that case can you  
> point me
> to the appropriate API's. I did a search at
> :http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ 
> javadoc/index.html
> But couldn't find anything.
>

It is possible if you use Term Vectors (see  
IndexReader.getTermFreqVector).  You will need to store (when you  
construct your Field) and load the term vectors and then calculate  
the similarity.  A common way of doing this is by calculating the  
cosine of the angle between the two vectors.

-Grant

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message