mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shannon Quinn (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (MAHOUT-537) Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
Date Thu, 30 Jun 2011 17:40:28 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057945#comment-13057945
] 

Shannon Quinn edited comment on MAHOUT-537 at 6/30/11 5:39 PM:
---------------------------------------------------------------

Ok, this is absolutely a total hack job, but I wanted to see if it would work: taking the
0.21 mapreduce.lib.join* package, tweaking it slightly to make it 0.20-compatible, and installing
it directly in Mahout to make DistributedRowMatrix 0.20-compliant.

It and the associated tests compile, but I've run into a problem of failing tests, the cause
of which seems to be that it won't write files to DistributedCache, HDFS, etc. I tried writing
to DistributedCache and immediately reading it back, which worked fine, but that didn't exactly
inform me as to why it can't be read within the Mapper. So otherwise I'm stuck and could use
some help.

If this isn't an avenue worth pursuing, that's also fine. I had the idea and wanted to give
it a shot before throwing in the towel and waiting for 0.22.

      was (Author: magsol):
    Ok, this is absolutely a total hack job, but I wanted to see if it would work: taking
the 0.21 mapreduce.lib.join* package, tweaking it slightly to make it 0.20-compatible, and
installing it directly in Mahout to make DistributedRowMatrix 0.20-compliant.

It and the associated tests compile, but I've run into a problem of failing tests, the cause
of which seems to be that it won't write files to DistributedCache, HDFS, etc. I tried writing
to DistributedCache and immediately reading it back, which worked fine, but otherwise I'm
stuck and could use some help.

If this isn't an avenue worth pursuing, that's also fine. I had the idea and wanted to give
it a shot before throwing in the towel and waiting for 0.22.
  
> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> -------------------------------------------------------------
>
>                 Key: MAHOUT-537
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-537
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.4, 0.5
>            Reporter: Shannon Quinn
>            Assignee: Shannon Quinn
>             Fix For: 0.6
>
>         Attachments: MAHOUT-537.patch, MAHOUT-537.patch, MAHOUT-537.patch, MAHOUT-537.patch,
MAHOUT-537_hack.patch
>
>
> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API, in particular
eliminate dependence on the deprecated JobConf, using instead the separate Job and Configuration
objects.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message