mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: LDA clustering documentation (mahout-07-snapshot)
Date Thu, 12 Apr 2012 16:09:56 GMT
Hi Antonio,

  Are you using the new LDA (invoked via "$MAHOUT_HOME/bin/mahout cvb
<args>",
or by invoking the class org.apache.mahout.clustering.lda.cvb.CVB0Driver
manually)?

  If so, then your first command should work fine:

mahout vectordump -i DB-LDA-clusters/docTopics/part-m-00000
-o output/cluster_lda_topics.txt

  What error do you get?

On Thu, Apr 12, 2012 at 6:21 AM, antonio d'agata <antoniodagata@gmail.com>wrote:

> Dear users,
>
> I'm trying to use lda clustering algorithm by command line (using
> mahout-07-snapshot) and I was able to get the topics (as text file
> containing the top words) but I need also to get the documents id
> associated to the calculated topics.
>
> I tried this commands:
> mahout vectordump -i DB-LDA-clusters/docTopics/part-m-00000 -o
> output/cluster_lda_topics.txt
> mahout vectordump -i DB-LDA-clusters/docTopics/part-m-00000 -o
> output/cluster_lda_topics.txt -dt text(or sequencefile)
> but without success.
>
> Is there a way to do such work?
>
> Thanks
>
> Antonio Michelangelo D'Agata
>



-- 

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message