mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilina Gunarathne <cset...@gmail.com>
Subject Interpreting the results of LDA CVB
Date Mon, 07 Jan 2013 15:19:56 GMT
Dear All,
I'm trying to run the Mahout LDA (cvb version) on a subset of the 20news
data set, as a sample for an Hadoop publications we are working on.  I need
some help in understanding the Maout output to figure out the topics.

I ran the following commands on the TF vectors generated using seq2sparse
command.
>bin/mahout rowid -i 20news-tf/tf-vectors -o 20news-tf-int
>bin/mahout cvb -i 20news-tf-int/matrix -o lda-out -k 10  -x 20  -dict
20news-tf/dictionary.file-0 -dt lda-topics -mt lda-topic-model

After that I dumped the results using the vectordump as follows.

>bin/mahout vectordump -i lda-topics/part-m-00000 --dictionary
20news-tf/dictionary.file-0 --vectorSize 10  -dt sequencefile
......

{"Fluxgate:0.12492744375758073,&:0.03875953927132082,(140.220.1.1):0.1228639250669511,(Babak:0.15074522974495433,(Bill:0.10512715697420276,(Gerrit:0.10130565323653766,(Michael:0.061169131590630275,(Scott:0.14501579630233746,(Usenet:0.07872957132697946,(continued):0.07135655272850545}
{"Fluxgate:0.13130952097888746,&:0.05207587369196414,(140.220.1.1):0.12533225607394424,(Babak:0.08607740024552457,(Bill:0.20218284543514245,(Gerrit:0.07318295757631627,(Michael:0.08766888242201039,(Scott:0.08858421220476514,(Usenet:0.09201906604666685,(continued):0.06156698532477829}
.......

It would be great if someone can help me to interpret the above results.
The probability values seems to be more or less similar in all the cases.
Is it due to the smaller size of the dataset?

thanks,
Thilina

-- 
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message