mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben West <bwsithspaw...@yahoo.com>
Subject Re: LDA question
Date Mon, 05 Sep 2011 17:41:42 GMT
Never mind, I guess the -v parameter is a limit not on the number of words you would like to
use, but the number of words which exist in the dictionary.



----- Original Message -----
From: Ben West <bwsithspawn00@yahoo.com>
To: "user@mahout.apache.org" <user@mahout.apache.org>
Cc: 
Sent: Monday, September 5, 2011 12:34 PM
Subject: Re: LDA question

Thanks Jake! I changed my command to:

$MAHOUT lda -i $BASE_DIR/termvecs/tf-vectors -o $BASE_DIR/lda_working -k 2 -v 10000

And now I get:

11/09/05 12:06:25 WARN mapred.LocalJobRunner: job_local_0001
org.apache.mahout.math.IndexException: Index 10007 is outside allowable range of [0,10000)
    at org.apache.mahout.math.AbstractMatrix.get(AbstractMatrix.java:412)
    at org.apache.mahout.clustering.lda.LDAState.logProbWordGivenTopic(LDAState.java:45)
    at org.apache.mahout.clustering.lda.LDAInference.eStepForWord(LDAInference.java:225)
    at org.apache.mahout.clustering.lda.LDAInference.infer(LDAInference.java:110)
    at org.apache.mahout.clustering.lda.LDAWordTopicMapper.map(LDAWordTopicMapper.java:48)
    at org.apache.mahout.clustering.lda.LDAWordTopicMapper.map(LDAWordTopicMapper.java:36)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
11/09/05 12:06:26 INFO mapred.JobClient:  map 0% reduce 0%
11/09/05 12:06:26 INFO mapred.JobClient: Job complete: job_local_0001
11/09/05 12:06:26 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.lang.InterruptedException: LDA Iteration failed processing
/home/ben/Scripts/eipi/lda_working/state-0
    at org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:427)
    at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:226)
    at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:174)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:90)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


I reran the seqdirectory and seq2sparse commands and they seemed to work fine, but I keep
getting this error. Any idea what I'm doing wrong?

Thanks,
-Ben



----- Original Message -----
From: Jake Mannix <jake.mannix@gmail.com>
To: user@mahout.apache.org; Ben West <bwsithspawn00@yahoo.com>
Cc: 
Sent: Monday, September 5, 2011 11:30 AM
Subject: Re: LDA question

Hi Ben,

On Mon, Sep 5, 2011 at 8:38 AM, Ben West <bwsithspawn00@yahoo.com> wrote:
>
>
> ~/Scripts/Mahout/trunk/bin/mahout seqdirectory --input
> /home/ben/Scripts/eipi/files --output /home/ben/Scripts/eipi/mahout_out
> -chunk 1
> ~/Scripts/Mahout/trunk/bin/mahout seq2sparse -i
> /home/ben/Scripts/eipi/mahout_out -o /home/ben/Scripts/eipi/termvecs -wt tf
> -seq
>
>
The "output" directory (/home/ben/Scripts/eipi/termvecs) has a bunch of
subdirectories, only one of which actually contains your Vectors.  In this
case, you've done tf-normalization, so they're
in /home/ben/Scripts/eipi/termvecs/tf-vectors.  This is the directory you
want to give to LDA as input.

  -jake

Mime
View raw message