lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: How to get the number of unique terms in the inverted index
Date Fri, 28 May 2010 12:19:41 GMT
It seems like there should be a formula for estimating the total
number of unique terms given that you know the unique term counts for
each segment, and make certain assumptions like random document
distribution across segments.

-Yonik
http://www.lucidimagination.com

On Thu, May 27, 2010 at 9:17 PM, kannan chandrasekaran
<ckannanck@yahoo.com> wrote:
> I am just trying out a few experiments to calculate similarity between terms based on
their co-occurences in the dataset...  Basically I am trying to build contextual vectors
 and calculate similarity using a similarity measure ( say cosine similarity).....
>
> I dont think this is an XY problem . The vectors I am trying to build are not the same
as the TermVectors option ((term,freq) pairs per document) in the lucene ( if thats what u
meant)
>
> Thanks
> Kannan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message