mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ivan obeso <sendero.lumin...@gmail.com>
Subject Using TFIDF instead of TF Vectors in LDA
Date Tue, 19 Jun 2012 08:06:34 GMT
Hi all,

Im using the 0.6 version of Mahout, and I have read that the LDA
implementation of the algorithm in this version can work with TFIDF vectors
as well as TF vectors. The problem is that
DocumentProcessor.tokenizeDocuments and
DictionaryVectorizer.createTermFrequencyVectors uses sequencefiles formed
by Text as key and Text as value.

Now, i want to use TFIDFConverter.calculateDF and
TFIDFConverter.processTfIdf but this methods uses VectorWritable as value
in the sequencefile. Am I doing the things in the right way?
How can I transform the Text sequencefile into VectorWritable sequencefile?

I get the next exception:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.mahout.math.VectorWritable

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message