lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kannan chandrasekaran <>
Subject Re: How to get the number of unique terms in the inverted index
Date Fri, 28 May 2010 01:17:03 GMT
I am just trying out a few experiments to calculate similarity between terms based on their
co-occurences in the dataset...  Basically I am trying to build contextual vectors  and calculate
similarity using a similarity measure ( say cosine similarity)..... 

I dont think this is an XY problem . The vectors I am trying to build are not the same as
the TermVectors option ((term,freq) pairs per document) in the lucene ( if thats what u meant)



OK, let's back up a level. WHY are you building these
vectors? Where I'm going with this is I wonder if this
is an XY problem, see:


On Thu, May 27, 2010 at 7:49 PM, kannan chandrasekaran

> Uwe,
> I now see the problem with overlapping terms across segments...Thanks...
> Erik,
> Good point...My usecase for this is ,
> I am trying to build vectors for individual terms and documents and I need
> to know the size to handle memory constraints
> Thanks
> Kannan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message