mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From narasimha Sandu <narsa...@cisco.com>
Subject RE: Error running mahout cvb
Date Wed, 10 Jul 2013 04:28:17 GMT
We faced similar issue with Mahout 0.7, with Mahout 0.8 it is resolved...

-----Original Message-----
From: dilpreet singh [mailto:giggs102@gmail.com] 
Sent: Wednesday, July 10, 2013 4:59 AM
To: user@mahout.apache.org
Subject: Re: Error running mahout cvb

Thanks for the advice guys . That was hellpful . I modified the script to
create a matrix . Now i am hitting this error :

*Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0*

I think something might be wrong with vector dump but not sure .
My Script now looks like this :

*export HADOOP_HOME=/usr/local/hadoop/hadoop-1.1.2
export HADOOP_CONF_DIR=$HADOOP_HOME/conf

export JAVA_OPTS=="-Xmx12240m -Xms1024m -server"



./mahout seqdirectory --input /mahout/lda_input --output
/mahout/input_seqfiles -c UTF8


./mahout seq2sparse -i /mahout/input_seqfiles -o /mahout/input_seqparse -wt
tf

./mahout rowid -i /mahout/input_seqparse/tf-vectors -o /mahout/matrix

rm -rf /mahout/lda_output/final_output
rm -rf /mahout/lda_output/docTopics

./mahout cvb -i /mahout/matrix/matrix -o /mahout/lda_output/final_output
-mt /mahout/lda_output/models -dt /mahout/lda_output/docTopics -k 6 -nt 10
-x 4 -ow

./mahout vectordump -i /mahout/lda_output/final_output -d
/mahout/input_seqparse/dictionary.file-0 -dt sequencefile --vectorSize 10
--printKey TRUE
*

I was expecting to see top 10 terms from each topic in the terminal . Any
suggestions ?



Dilpreet Singh



On Tue, Jul 9, 2013 at 2:21 AM, Corey Hyllested
<corey.hyllested@gmail.com>wrote:

> Agreed.
>
>
> After seq2sparse, you need to create a matrix.
>
> http://stackoverflow.com/questions/14757162/run-cvb-in-mahout-0-8
>
> so, something like this.
>
> mahout rowid -i $work_dir/input_seqparse/tf-vectors -o $work_dir/matrix
>
> mahout cvb -i $work_dir/matrix/ -o $work_dir/lda_output -mt
> $work_dir/lda_output/models -dt $work_dir/lda_output/docTopics -k 3
> -nt -maxIter 200
>
>
> Unsolicited advice.
>
> There is no reason to trash your sequence files (rm -rf
> $work_dir/input_seqfiles) each time.
>
> Provide a model location, this allows the computation to pick up where
> it left off if something were to go awry.
>
>
> - Corey
>
>
> On Mon, Jul 8, 2013 at 10:43 PM, Gmail <giggs102@gmail.com> wrote:
>
> > Hi
> >
> > I am trying to run the mahout cvb on hadoop cluster using some text
files
> > as input . I am getting the following error :
> >
> > Exception in thread "main" java.lang.IllegalStateException: No part
files
> > found in model path 'temp/topicModelState/model-1'
> >
> > My script for running mahout cvb looks like this :
> >
> > export work_dir=/home/mahout
> >
> > rm -rf $work_dir/input_seqfiles
> >
> > ./mahout seqdirectory --input $work_dir/lda_input --output
> > $work_dir/input_seqfiles -c UTF8
> >
> > rm -rf $work_dir/input_seqparse
> >
> > ./mahout seq2sparse -i $work_dir/input_seqfiles -o
> > $work_dir/input_seqparse -wt tf
> >
> > ./mahout cvb -i $work_dir/input_seqparse -o $work_dir/lda_output -k 3
-nt
> > 10 --maxIter 200
> >
> >
> > Is there something i am missing ? Any help or suggestion is greatly
> > appreciated .
> >
> > Thanks
> >
> >
>


Mime
View raw message