lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sengly Heng" <>
Subject Re: TF-IDF API
Date Wed, 28 Mar 2007 13:24:43 GMT
Thank you but I still have have no clue of how to do that by using Weka
after taking a look at its API. Let me reformulate my problem :

I have a collection of vector of terms (actually each vector of terms
represents the list of tokens extracted from a file) and I do not have the
original files. I would like to calculate TF as well as TFIDF of each term
and sorted them by these value respectively. As suggested by Grant
Ingersoll, I could index those vectors of terms again using Lucene and then
use its API to measure TF and TFIDF. However I guess there should be a
simpler way or API just fit-in this case.

Thanks once again everyone.

Best regards,


On 3/28/07, karl wettin <> wrote:
> 28 mar 2007 kl. 10.36 skrev Sengly Heng:
> > Does anyone of you know any Java API that directly handle this
> > problem?
> > or I have to implement from scratch.
> You can also try
> weka.filters.unsupervised.attribute.StringToWordVector, it has many
> neat features you might be interested in. And if applicable to what
> you attempt to do, the feature selection algorithms of the same
> project (Weka) does a great job reducing the data set.
> It is GPL.
> --
> karl
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message