mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Musselman <andrew.mussel...@gmail.com>
Subject Re: Reuters Example LDA Error (no help anywhere)
Date Thu, 20 Mar 2014 22:32:13 GMT
Filed a ticket here:  https://issues.apache.org/jira/browse/MAHOUT-1470


On Thu, Mar 6, 2014 at 4:36 PM, Cosinus WebDev <officewebdev@gmail.com>wrote:

> Hi,
>
> Thank you for the answer, now I can rest a second :)
>
> Hope this will be fixed soon. If you file a JIRA please send me the link so
> I can watch the result.
>
> Thank you again,
>
> And one more question or two
> 1. vectordumping the cvb result(/work/out/cvb) is terms in topic
> 2. inside topics directory(/work/out/topics) should be the "best" terms
> from all topics ???
>
> bin/mahout cvb \
> -i /work/matrix \
> -o /work/out/cvb -k 100 -ow -x 20 \
> -dt /work/out/topics \
> ....
>
>
>
>
> On Fri, Mar 7, 2014 at 2:07 AM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
>
> > Typo in previous email, read as:
> >
> > "Ideally Mahout's missing a clusterdump like utility for that reads in
> LDA
> > topics, Document - DocumentId mapping and displays a report of the
> > topics and the documents that belong to a topic."
> >
> >
> >
> >
> > On Thursday, March 6, 2014 7:06 PM, Suneel Marthi <
> suneel_marthi@yahoo.com>
> > wrote:
> >
> > The script needs to be corrected to not call vectordump for LDA as
> > vectordump utility (or even clusterdump) are presently not capable of
> > displaying topics and relevant documents. I recall this issue was
> > previously reported by Peyman Faratin post 0.9 release.
> >
> > Ideally Mahout's missing a clusterdump utility for that reads in LDA
> > topics, Document - DocumentId mapping and displays a report of the topics
> > and the documents that belong to a topic.
> >
> > Meanwhile in order to see the generated topics and documents please refer
> > to this blog:
> >
> http://sujitpal.blogspot.com/2013/10/topic-modeling-with-mahout-on-amazon-emr.html
> >
> > Let me file a JIRA for this.
> >
> >
> >
> >
> >
> >
> > On Thursday, March 6, 2014 6:12 PM, Cosmin Dumbrava <
> > officewebdev@gmail.com> wrote:
> >
> > I don't know if is ok to mail on this address like this but... there is
> >
> > I have executed cluster-reuters.sh from example directory (vers 1.0
> > SNAPSHOT) and at the end i only get a list of
> > .....
> > 21575
> > {0.02:0.6314297270431626,0.03:
> >
> >
> 0.12547216143460152,0.007050:0.08061044448337305,0.04:0.07121802301642256,0.025:0.0677648308012434,0.003:0.0221466872297289,0.06:4.4720109631453837E-4,0.01:4.0331445050718065E-4,0.077:1.0509017796402916E-4,0.1:6.868649426131684E-5}
> > 21576
> >
> >
> {0.055:0.7123345754234253,0.003:0.10345316403842542,0.025:0.07850931669910466,0.1:0.0688641506163345,0.06:0.010599081492449824,0:0.0081953368778766,0.04:0.00469907695241742,0.03:0.003966985061879055,0.07:0.002197060890631658,0.0625:0.0020741956232281466}
> > 21577
> >
> >
> {0.04:0.5277733526037044,0.01:0.46656672162804314,0.07:0.0024295914763474164,0.1:0.002243674469679058,0.077:8.012577174900807E-4,0.007050:3.9184997476998896E-5,0.03:3.2141106779800255E-5,0.0625:2.4665616652494003E-5,0.02:1.949377177063371E-5,0.025:1.3329985998932362E-5}
> > ....
> >
> > $MAHOUT cvb \
> >     -i ${WORK_DIR}/reuters-out-matrix/matrix \
> >     -o ${WORK_DIR}/reuters-lda -k 20 -ow -x 20
> > \
> >     -dict ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
> >     -dt ${WORK_DIR}/reuters-lda-topics \
> >     -mt ${WORK_DIR}/reuters-lda-model \
> >   && \
> >   $MAHOUT vectordump \
> >     -i ${WORK_DIR}/reuters-lda-topics/part-m-00000 \
> >     -o ${WORK_DIR}/reuters-lda/vectordump \
> >     -vs 10 -p true \
> >     -d ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
> >     -dt sequencefile -sort ${WORK_DIR}/reuters-lda-topics/part-m-00000 \
> >     && \
> >
> > I must do something to output from this on?
> >
> > The same thing happens when i tried to implement on my own
> >
> >
> > Thnaks in advance
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message