lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <>
Subject Re: How to get the number of unique terms in the inverted index
Date Fri, 28 May 2010 12:19:41 GMT
It seems like there should be a formula for estimating the total
number of unique terms given that you know the unique term counts for
each segment, and make certain assumptions like random document
distribution across segments.


On Thu, May 27, 2010 at 9:17 PM, kannan chandrasekaran
<> wrote:
> I am just trying out a few experiments to calculate similarity between terms based on
their co-occurences in the dataset...  Basically I am trying to build contextual vectors
 and calculate similarity using a similarity measure ( say cosine similarity).....
> I dont think this is an XY problem . The vectors I am trying to build are not the same
as the TermVectors option ((term,freq) pairs per document) in the lucene ( if thats what u
> Thanks
> Kannan

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message