lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sengly Heng" <sengly.h...@gmail.com>
Subject Re: TF-IDF API
Date Wed, 28 Mar 2007 13:24:43 GMT
Thank you but I still have have no clue of how to do that by using Weka
after taking a look at its API. Let me reformulate my problem :

I have a collection of vector of terms (actually each vector of terms
represents the list of tokens extracted from a file) and I do not have the
original files. I would like to calculate TF as well as TFIDF of each term
and sorted them by these value respectively. As suggested by Grant
Ingersoll, I could index those vectors of terms again using Lucene and then
use its API to measure TF and TFIDF. However I guess there should be a
simpler way or API just fit-in this case.

Thanks once again everyone.

Best regards,

Sengly


On 3/28/07, karl wettin <karl.wettin@gmail.com> wrote:
>
>
> 28 mar 2007 kl. 10.36 skrev Sengly Heng:
>
> > Does anyone of you know any Java API that directly handle this
> > problem?
> > or I have to implement from scratch.
>
> You can also try
> weka.filters.unsupervised.attribute.StringToWordVector, it has many
> neat features you might be interested in. And if applicable to what
> you attempt to do, the feature selection algorithms of the same
> project (Weka) does a great job reducing the data set.
>
> http://www.cs.waikato.ac.nz/ml/weka/
>
> It is GPL.
>
> --
> karl
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message