mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Green <>
Subject Re: Mahout on Elastic MapReduce
Date Tue, 14 Apr 2009 22:54:29 GMT

On Apr 14, 2009, at 5:17 PM, Grant Ingersoll wrote:

> I would be concerned about the fact that EMR is using 0.18 and  
> Mahout is on 0.19 (which of course raises another concern expressed  
> by Owen O'Malley to me at ApacheCon: No one uses 0.19)

Well, I did run Mahout locally on a 0.18.3 install, but that was  
writing to and reading from HDFS.  I can build a custom mahout- 
examples that has the 0.18.3 Hadoop jars (or perhaps no hadoop jar at  
all...) I'm guessing if EMR is on 0.18.3 and it gets popular, then  
you're going to have to deal with that problem.

> I'd say you should try reproducing the problem on the same version  
> that Mahout uses.

That'll be a bit tricky in the EMR case as that's Amazon's business  
(ask me about trying to get a 64bit Solaris AMI on Amazon's version of  

> FWIW, any committer on the Mahout project can likely get credits to  
> use AWS.

I'm happy to share my limited experience.


>> ----- Original Message ----
>>> From: Sean Owen <>
>>> To:
>>> Sent: Tuesday, April 14, 2009 4:19:51 PM
>>> Subject: Re: Mahout on Elastic MapReduce
>>> This is a fairly uninformed observation, but: the error seems to be
>>> from Hadoop. It seems to say that it understands hdfs:, but not  
>>> s3n:,
>>> and that makes sense to me. Do we expect Hadoop understands how to
>>> read from S3? I would expect not. (Though, you point to examples  
>>> that
>>> seem to overcome this just fine?)

As Otis pointed out, Hadoop can handle S3 a couple of ways, and the  
example that I've been working seems to be able to read the input data  
from an s3n URI no problem.

>>> When I have integrated code with stuff stored on S3, I have always  
>>> had
>>> to write extra glue code to copy from S3 to a local file system, do
>>> work, then copy back.

I think you do need to copy from S3 to HDFS, but I think that happens  
automagically (?  My Hadoop ignorance is starting to show!)

Stephen Green                      //
Principal Investigator             \\
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692

View raw message