mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hector Yee <hector....@gmail.com>
Subject Re: questions on the results of running lda and ldatopics, thanks
Date Thu, 30 Jun 2011 18:48:33 GMT
The clustering is unsupervised. It doesn't tell you what a topic stands for,
its up to you to assign what the topics are labeled based on the highest
scoring words.

On Thu, Jun 30, 2011 at 11:08 AM, wine lover <winecoding@gmail.com> wrote:

> Hello Everyone,
>
> I have two questions on the LDA analysis.
>
> After running the command of lda, under the generated directory of
> "testdata-lda", there have several folders: docTopics  state-0   state-1
> ....
>
> It seems to me that those folders of "state-x" will be transferred into
> readable format after running "ldatopics". But what does the folder of
> "docTopics" stand for? How can I view it?
>
> Running the command of ldatopics generates 20 files, (topic_0, topic_1,
> etc), in total. For instance, in the file of topic_0, I get information
> such
> as follows:
> model [p(model|topic_0) = 0.010358664102351409
> tissues [p(tissues|topic_0) = 0.008870984984037485
>
> How can I tell what does topic_0 stand for? Where to find this kind of
> information?  Moreover, is there any other procedures existed to generate
> the clustering result based on these topic_x files.
>
>
> Thank you very much for the help.
>
> Wenyia
>



-- 
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message