mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: questions on the results of running lda and ldatopics, thanks
Date Fri, 01 Jul 2011 03:28:44 GMT
I think this requires a separate program which does not exist.

On Thu, Jun 30, 2011 at 12:02 PM, wine lover <winecoding@gmail.com> wrote:
> Thanks, Hector, you are right, the exact meaning of topic_i is not necessary
> for unsupervised clustering.
>
> However, in order to cluster a set of documents, I still need to know the
> probabilistic relationship between topic and each document. I am not very
> clear how to get this kind of information from the generated result.
>
> For instance, model [p(model|topic_0) = 0.010358664102351409  Here, model is
> a word, but the result does not tell me anything between this word and a
> given document? Thanks.
>
>
> On Thu, Jun 30, 2011 at 2:08 PM, wine lover <winecoding@gmail.com> wrote:
>
>> Hello Everyone,
>>
>> I have two questions on the LDA analysis.
>>
>> After running the command of lda, under the generated directory of
>> "testdata-lda", there have several folders: docTopics  state-0   state-1
>> ....
>>
>> It seems to me that those folders of "state-x" will be transferred into
>> readable format after running "ldatopics". But what does the folder of
>> "docTopics" stand for? How can I view it?
>>
>> Running the command of ldatopics generates 20 files, (topic_0, topic_1,
>> etc), in total. For instance, in the file of topic_0, I get information such
>> as follows:
>> model [p(model|topic_0) = 0.010358664102351409
>> tissues [p(tissues|topic_0) = 0.008870984984037485
>>
>> How can I tell what does topic_0 stand for? Where to find this kind of
>> information?  Moreover, is there any other procedures existed to generate
>> the clustering result based on these topic_x files.
>>
>>
>> Thank you very much for the help.
>>
>> Wenyia
>>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message