mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Isabel Drost <>
Subject Re: LDA tutorial?
Date Thu, 03 Sep 2009 14:31:15 GMT
On Wed, 2 Sep 2009 14:38:54 -0700
Grant Ingersoll <> wrote:


I have followed the tutorial and was able to run lda on the reuters
dataset. Some questions that occurred to me:

Looking at the resulting topics it seems like no stemming or
lemmatization has been done prior to generating the vectors. Is that

Do we have documentation on the vector format? I found but that
describes how to generate vectors from Lucene. I would like to run
MAHOUT-123 on a set of vectors generated from German texts. We already
have a document processing pipeline that is capable of tokenisation,
stemming, term selection and the like that I would like to reuse. I
guess I could reuse the org.apache.mahout.utils.vector.*


View raw message