mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: questions on the results of running lda and ldatopics, thanks
Date Fri, 01 Jul 2011 05:04:46 GMT
On Thu, Jun 30, 2011 at 12:02 PM, wine lover <winecoding@gmail.com> wrote:

> Thanks, Hector, you are right, the exact meaning of topic_i is not
> necessary
> for unsupervised clustering.
>
> However, in order to cluster a set of documents, I still need to know the
> probabilistic relationship between topic and each document. I am not very
> clear how to get this kind of information from the generated result.
>
> For instance, model [p(model|topic_0) = 0.010358664102351409  Here, model
> is
> a word, but the result does not tell me anything between this word and a
> given document? Thanks.
>

The current release of Mahout does produce the p(topic | document)
probabilities,
it gets emitted after the final iteration, and is in a sequence file in the
same
directory as the model outputs.  I think it's called "docTopics" or
something
like that?

  -jake


>
> On Thu, Jun 30, 2011 at 2:08 PM, wine lover <winecoding@gmail.com> wrote:
>
> > Hello Everyone,
> >
> > I have two questions on the LDA analysis.
> >
> > After running the command of lda, under the generated directory of
> > "testdata-lda", there have several folders: docTopics  state-0   state-1
> > ....
> >
> > It seems to me that those folders of "state-x" will be transferred into
> > readable format after running "ldatopics". But what does the folder of
> > "docTopics" stand for? How can I view it?
> >
> > Running the command of ldatopics generates 20 files, (topic_0, topic_1,
> > etc), in total. For instance, in the file of topic_0, I get information
> such
> > as follows:
> > model [p(model|topic_0) = 0.010358664102351409
> > tissues [p(tissues|topic_0) = 0.008870984984037485
> >
> > How can I tell what does topic_0 stand for? Where to find this kind of
> > information?  Moreover, is there any other procedures existed to generate
> > the clustering result based on these topic_x files.
> >
> >
> > Thank you very much for the help.
> >
> > Wenyia
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message