lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kannan chandrasekaran <ckanna...@yahoo.com>
Subject Re: How to get the number of unique terms in the inverted index
Date Fri, 28 May 2010 01:17:03 GMT
I am just trying out a few experiments to calculate similarity between terms based on their
co-occurences in the dataset...  Basically I am trying to build contextual vectors  and calculate
similarity using a similarity measure ( say cosine similarity)..... 

I dont think this is an XY problem . The vectors I am trying to build are not the same as
the TermVectors option ((term,freq) pairs per document) in the lucene ( if thats what u meant)


Thanks
Kannan




________________________________

OK, let's back up a level. WHY are you building these
vectors? Where I'm going with this is I wonder if this
is an XY problem, see:
http://people.apache.org/~hossman/#xyproblem

Best
Erick

On Thu, May 27, 2010 at 7:49 PM, kannan chandrasekaran
<ckannanck@yahoo.com>wrote:

> Uwe,
>
> I now see the problem with overlapping terms across segments...Thanks...
>
> Erik,
>
> Good point...My usecase for this is ,
>
> I am trying to build vectors for individual terms and documents and I need
> to know the size to handle memory constraints
>
> Thanks
> Kannan
>
>
>


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message