mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben West <>
Subject LDA question
Date Mon, 05 Sep 2011 15:38:05 GMT
Hey all,

I'm trying the Latent Dirichlet Allocation operator. I made my term vectors as specified here: with these commands:

~/Scripts/Mahout/trunk/bin/mahout seqdirectory --input /home/ben/Scripts/eipi/files --output
/home/ben/Scripts/eipi/mahout_out -chunk 1
~/Scripts/Mahout/trunk/bin/mahout seq2sparse -i /home/ben/Scripts/eipi/mahout_out -o /home/ben/Scripts/eipi/termvecs
-wt tf -seq

Then I run this, trying to follow these instructions:

~/Scripts/Mahout/trunk/bin/mahout lda -i /home/ben/Scripts/eipi/termvecs -o /home/ben/Scripts/eipi/lda_working
-k 2 -v 100 
And I get:

MAHOUT-JOB: /home/ben/Scripts/Mahout/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
11/09/04 16:28:59 INFO common.AbstractJob: Command line arguments: 
{--endPhase=2147483647, --input=/home/ben/Scripts/eipi/termvecs, 
--maxIter=-1, --numTopics=2, --numWords=100, 
--output=/home/ben/Scripts/eipi/lda_working, --startPhase=0, 
--tempDir=temp, --topicSmoothing=-1.0} 11/09/04 16:29:00 INFO lda.LDADriver: LDA Iteration
1 11/09/04 16:29:01 INFO input.FileInputFormat: Total input paths to 
process : 4 11/09/04 16:29:01 INFO mapred.JobClient: Cleaning up the staging area 
file:/tmp/hadoop-ben/mapred/staging/ben692167368/.staging/job_local_0001 Exception in thread
"main" File 
file:/home/ben/Scripts/eipi/termvecs/tokenized-documents/data does not 
exist. at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus( at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus( at 

Does anyone know what I'm doing wrong?

View raw message