mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wine lover <winecod...@gmail.com>
Subject Re: questions on the results of running lda and ldatopics, thanks
Date Fri, 01 Jul 2011 13:42:10 GMT
Yes, Jake, you are right. I also noticed the existence of "docTopics", which
is a folder. I do not know how to view it or transfer its included files
into readable format. It seems to me that the command of ldatopics does not
do anything on "docTopics". Any suggestion will be highly appreciated.

On Fri, Jul 1, 2011 at 1:04 AM, Jake Mannix <jake.mannix@gmail.com> wrote:

> On Thu, Jun 30, 2011 at 12:02 PM, wine lover <winecoding@gmail.com> wrote:
>
> > Thanks, Hector, you are right, the exact meaning of topic_i is not
> > necessary
> > for unsupervised clustering.
> >
> > However, in order to cluster a set of documents, I still need to know the
> > probabilistic relationship between topic and each document. I am not very
> > clear how to get this kind of information from the generated result.
> >
> > For instance, model [p(model|topic_0) = 0.010358664102351409  Here, model
> > is
> > a word, but the result does not tell me anything between this word and a
> > given document? Thanks.
> >
>
> The current release of Mahout does produce the p(topic | document)
> probabilities,
> it gets emitted after the final iteration, and is in a sequence file in the
> same
> directory as the model outputs.  I think it's called "docTopics" or
> something
> like that?
>
>  -jake
>
>
> >
> > On Thu, Jun 30, 2011 at 2:08 PM, wine lover <winecoding@gmail.com>
> wrote:
> >
> > > Hello Everyone,
> > >
> > > I have two questions on the LDA analysis.
> > >
> > > After running the command of lda, under the generated directory of
> > > "testdata-lda", there have several folders: docTopics  state-0
> state-1
> > > ....
> > >
> > > It seems to me that those folders of "state-x" will be transferred into
> > > readable format after running "ldatopics". But what does the folder of
> > > "docTopics" stand for? How can I view it?
> > >
> > > Running the command of ldatopics generates 20 files, (topic_0, topic_1,
> > > etc), in total. For instance, in the file of topic_0, I get information
> > such
> > > as follows:
> > > model [p(model|topic_0) = 0.010358664102351409
> > > tissues [p(tissues|topic_0) = 0.008870984984037485
> > >
> > > How can I tell what does topic_0 stand for? Where to find this kind of
> > > information?  Moreover, is there any other procedures existed to
> generate
> > > the clustering result based on these topic_x files.
> > >
> > >
> > > Thank you very much for the help.
> > >
> > > Wenyia
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message