mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cosinus WebDev <officeweb...@gmail.com>
Subject Re: Reuters Example LDA Error (no help anywhere)
Date Fri, 07 Mar 2014 00:36:58 GMT
Hi,

Thank you for the answer, now I can rest a second :)

Hope this will be fixed soon. If you file a JIRA please send me the link so
I can watch the result.

Thank you again,

And one more question or two
1. vectordumping the cvb result(/work/out/cvb) is terms in topic
2. inside topics directory(/work/out/topics) should be the "best" terms
from all topics ???

bin/mahout cvb \
-i /work/matrix \
-o /work/out/cvb -k 100 -ow -x 20 \
-dt /work/out/topics \
....




On Fri, Mar 7, 2014 at 2:07 AM, Suneel Marthi <suneel_marthi@yahoo.com>wrote:

> Typo in previous email, read as:
>
> "Ideally Mahout's missing a clusterdump like utility for that reads in LDA
> topics, Document - DocumentId mapping and displays a report of the
> topics and the documents that belong to a topic."
>
>
>
>
> On Thursday, March 6, 2014 7:06 PM, Suneel Marthi <suneel_marthi@yahoo.com>
> wrote:
>
> The script needs to be corrected to not call vectordump for LDA as
> vectordump utility (or even clusterdump) are presently not capable of
> displaying topics and relevant documents. I recall this issue was
> previously reported by Peyman Faratin post 0.9 release.
>
> Ideally Mahout's missing a clusterdump utility for that reads in LDA
> topics, Document - DocumentId mapping and displays a report of the topics
> and the documents that belong to a topic.
>
> Meanwhile in order to see the generated topics and documents please refer
> to this blog:
> http://sujitpal.blogspot.com/2013/10/topic-modeling-with-mahout-on-amazon-emr.html
>
> Let me file a JIRA for this.
>
>
>
>
>
>
> On Thursday, March 6, 2014 6:12 PM, Cosmin Dumbrava <
> officewebdev@gmail.com> wrote:
>
> I don't know if is ok to mail on this address like this but... there is
>
> I have executed cluster-reuters.sh from example directory (vers 1.0
> SNAPSHOT) and at the end i only get a list of
> .....
> 21575
> {0.02:0.6314297270431626,0.03:
>
> 0.12547216143460152,0.007050:0.08061044448337305,0.04:0.07121802301642256,0.025:0.0677648308012434,0.003:0.0221466872297289,0.06:4.4720109631453837E-4,0.01:4.0331445050718065E-4,0.077:1.0509017796402916E-4,0.1:6.868649426131684E-5}
> 21576
>
> {0.055:0.7123345754234253,0.003:0.10345316403842542,0.025:0.07850931669910466,0.1:0.0688641506163345,0.06:0.010599081492449824,0:0.0081953368778766,0.04:0.00469907695241742,0.03:0.003966985061879055,0.07:0.002197060890631658,0.0625:0.0020741956232281466}
> 21577
>
> {0.04:0.5277733526037044,0.01:0.46656672162804314,0.07:0.0024295914763474164,0.1:0.002243674469679058,0.077:8.012577174900807E-4,0.007050:3.9184997476998896E-5,0.03:3.2141106779800255E-5,0.0625:2.4665616652494003E-5,0.02:1.949377177063371E-5,0.025:1.3329985998932362E-5}
> ....
>
> $MAHOUT cvb \
>     -i ${WORK_DIR}/reuters-out-matrix/matrix \
>     -o ${WORK_DIR}/reuters-lda -k 20 -ow -x 20
> \
>     -dict ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
>     -dt ${WORK_DIR}/reuters-lda-topics \
>     -mt ${WORK_DIR}/reuters-lda-model \
>   && \
>   $MAHOUT vectordump \
>     -i ${WORK_DIR}/reuters-lda-topics/part-m-00000 \
>     -o ${WORK_DIR}/reuters-lda/vectordump \
>     -vs 10 -p true \
>     -d ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
>     -dt sequencefile -sort ${WORK_DIR}/reuters-lda-topics/part-m-00000 \
>     && \
>
> I must do something to output from this on?
>
> The same thing happens when i tried to implement on my own
>
>
> Thnaks in advance
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message