mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-458) The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words.
Date Fri, 06 Aug 2010 20:05:15 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896144#action_12896144
] 

Ted Dunning commented on MAHOUT-458:
------------------------------------


Can you provide a rough patch to show what you would like to have happen?


> The LDA output does not include the topic-probability distribution per document (p(z|d)).
It outputs only the topics and corresponding words.
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-458
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-458
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.4
>            Reporter: Himanshu Gahlot
>             Fix For: 0.4
>
>
> The current implementation of LDA outputs only topics and their words. Many applications
need the p(z|d) values of a document to use this vector as a reduced representation of the
document (dimensionality reduction of document). We need to introduce a new key which would
keep track of the gamma values for each document (as obtained from the document.infer() method)
and writes these to the output stream and finally, PrintLDATopics should output these values
per document id. Also, outputting the probabilities of words in a topic would also provide
a more meaningful output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message