mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wine lover <winecod...@gmail.com>
Subject Re: questions on the results of running lda and ldatopics, thanks
Date Thu, 30 Jun 2011 19:02:09 GMT
Thanks, Hector, you are right, the exact meaning of topic_i is not necessary
for unsupervised clustering.

However, in order to cluster a set of documents, I still need to know the
probabilistic relationship between topic and each document. I am not very
clear how to get this kind of information from the generated result.

For instance, model [p(model|topic_0) = 0.010358664102351409  Here, model is
a word, but the result does not tell me anything between this word and a
given document? Thanks.


On Thu, Jun 30, 2011 at 2:08 PM, wine lover <winecoding@gmail.com> wrote:

> Hello Everyone,
>
> I have two questions on the LDA analysis.
>
> After running the command of lda, under the generated directory of
> "testdata-lda", there have several folders: docTopics  state-0   state-1
> ....
>
> It seems to me that those folders of "state-x" will be transferred into
> readable format after running "ldatopics". But what does the folder of
> "docTopics" stand for? How can I view it?
>
> Running the command of ldatopics generates 20 files, (topic_0, topic_1,
> etc), in total. For instance, in the file of topic_0, I get information such
> as follows:
> model [p(model|topic_0) = 0.010358664102351409
> tissues [p(tissues|topic_0) = 0.008870984984037485
>
> How can I tell what does topic_0 stand for? Where to find this kind of
> information?  Moreover, is there any other procedures existed to generate
> the clustering result based on these topic_x files.
>
>
> Thank you very much for the help.
>
> Wenyia
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message