mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arni Sumarlidason <Arni.Sumarlida...@mdaus.com>
Subject Re: Mahout: CVB: Error
Date Sun, 04 Nov 2012 22:44:38 GMT
Dan,

Regarding this thread,
http://comments.gmane.org/gmane.comp.apache.mahout.user/13641

Did you publish your modification to the rowid function enabling the splitting of Matrix files?
A single pass on my data takes 9 hours. Does this sound reasonable to you? please advise.

Best,

Arni

On Nov 3, 2012, at 8:38 PM, DAN HELM <danielhelm@verizon.net<mailto:danielhelm@verizon.net>>
wrote:

Arni,

I believe you are running with the wrong input for the cvb command: ./mahout cvb -i /user/root/sparse-vectors-cvb/docIndex
.....

It should be: ./mahout cvb -i /user/root/sparse-vectors-cvb/Matrix .....

docIndex is a file generated by rowid that provides a mapping between the original sparse
vector keys (in Text format) to the Integer keys assigned by rowid.

Dan

From: Arni Sumarlidason <Arni.Sumarlidason@mdaus.com<mailto:Arni.Sumarlidason@mdaus.com>>
To: "user@mahout.apache.org<mailto:user@mahout.apache.org>" <user@mahout.apache.org<mailto:user@mahout.apache.org>>
Sent: Saturday, November 3, 2012 6:35 PM
Subject: Mahout: CVB: Error

Good Evening, Thank you for reading.. I am trying to run CVB on mahout 0.8...

I have successfully executed the following steps:
./mahout seqdirectory --input /user/root/lda --output text_seq -c UTF-8 -ow -chunk 8
Resulting in 20 chunk files.

./mahout seq2sparse -i text_seq -o text_vec -wt tf -a org.apache.lucene.analysis.WhitespaceAnalyzer
-ow
Resulting in 109MB vector, "part-r-00000", "dictionary.file-0", and more.

./mahout rowid -i text_vec/tf-vectors -o sparse-vectors-cvb
Resulting in "docIndex" & "matrix"

Now... When attempting to run the following command,
./mahout cvb -i /user/root/sparse-vectors-cvb/docIndex -o text_lda -k 100 -x 20 -dict text_vec/dictionary.file-0
-dt text_cvb_document -mt text_states
Resulting in an error: No part files found in model path 'text_states/model-1'

Can someone please point me in the right direction?

Best regards,

Arni





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message