lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Document Similarities lucene(particularly using doc id's)
Date Fri, 17 Aug 2007 20:22:30 GMT

On Aug 16, 2007, at 2:20 PM, Lokeya wrote:

> Hi All,
> I have the following set up: a) Indexed set of docs. b) Ran 1st  
> query and
> got tops docs  c) Fetched the id's from that and stored in a data  
> structure.
> d) Ran 2nd query , got top docs , fetched id's and stored in a data
> structure.
> Now i have 2 sets of doc ids (set 1) and (set 1).
> I want to find out the document content similarity between these 2  
> sets(just
> using doc ids information which i have).

Not sure what you mean here.  What do the doc ids have to do with the  

> Qn 1: Is it possible using any lucene api's. In that case can you  
> point me
> to the appropriate API's. I did a search at
> : 
> javadoc/index.html
> But couldn't find anything.

It is possible if you use Term Vectors (see  
IndexReader.getTermFreqVector).  You will need to store (when you  
construct your Field) and load the term vectors and then calculate  
the similarity.  A common way of doing this is by calculating the  
cosine of the angle between the two vectors.


Grant Ingersoll

Lucene Helpful Hints:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message