mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suneel Marthi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1629) Mahout cvb on AWS EMR: p(topic|docId) doesn't make sense when using s3 folder as --input
Date Thu, 17 Mar 2016 16:03:33 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199781#comment-15199781
] 

Suneel Marthi commented on MAHOUT-1629:
---------------------------------------

Have not heard back from the original poster in a while, and more over this is an issue with
the legacy MapReduce CVB code. Resolving this as 'Cannot Reproduce', please feel free to open
another jira if the issue recurs.

> Mahout cvb on AWS EMR: p(topic|docId) doesn't make sense when using s3 folder as --input
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1629
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1629
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.9
>         Environment: AWS EMR with AMI 3.2.3
>            Reporter: Markus Paaso
>            Assignee: Andrew Musselman
>              Labels: legacy
>             Fix For: 0.12.0
>
>
> When running 'mahout cvb' command on AWS EMR having option --input with value like s3://mybucket/input/
or s3://mybucket/input/* (7 input files in my case) the content of doc-topic output is really
non-sense. It seems like the docIds in doc-topic output are shuffled. But the topic model
output (p(term|topic) for each topic) looks still fine.
> The workaround is to first copy input files from s3 to cluster's hdfs with command:
>  {code:none}hadoop fs -cp s3://mybucket/input /input{code}
> and then running mahout cvb with option --input /input .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message