From mahout-user-return-407-apmail-lucene-mahout-user-archive=lucene.apache.org@lucene.apache.org Tue Apr 14 22:55:18 2009 Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 31154 invoked from network); 14 Apr 2009 22:55:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Apr 2009 22:55:18 -0000 Received: (qmail 34917 invoked by uid 500); 14 Apr 2009 22:55:17 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 34855 invoked by uid 500); 14 Apr 2009 22:55:17 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 34845 invoked by uid 99); 14 Apr 2009 22:55:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Apr 2009 22:55:17 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [204.153.12.50] (HELO mail-mta.sunlabs.com) (204.153.12.50) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Apr 2009 22:55:07 +0000 Received: from mail.sunlabs.com ([152.70.2.186]) by mail-mta.sfvic.sunlabs.com (Sun Java System Messaging Server 6.1 HotFix 0.02 (built Aug 25 2004)) with ESMTP id <0KI400AEN4ZA9H00@mail-mta.sfvic.sunlabs.com> for mahout-user@lucene.apache.org; Tue, 14 Apr 2009 15:54:46 -0700 (PDT) Received: from [192.168.0.199] ([173.48.243.127]) by mail.sunlabs.com (Sun Java System Messaging Server 6.1 HotFix 0.02 (built Aug 25 2004)) with ESMTPSA id <0KI400FJ24YUX1L0@mail.sunlabs.com> for mahout-user@lucene.apache.org; Tue, 14 Apr 2009 15:54:46 -0700 (PDT) Date: Tue, 14 Apr 2009 18:54:29 -0400 From: Stephen Green Subject: Re: Mahout on Elastic MapReduce In-reply-to: <6549C553-630B-423D-A5FC-5E9B630489D9@apache.org> To: mahout-user@lucene.apache.org Message-id: <3CD0BB5E-BC9A-4CA6-9C71-6207AC8142ED@sun.com> MIME-version: 1.0 X-Mailer: Apple Mail (2.930.3) Content-type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-transfer-encoding: 7BIT References: <49E4BEF9.8050302@windwardsolutions.com> <57E9288D-DC7D-413F-BC7F-CC9BDA538752@sun.com> <600121FD-93C3-4D90-81A8-82F945BEAF82@sun.com> <314571.61307.qm@web50302.mail.re2.yahoo.com> <6549C553-630B-423D-A5FC-5E9B630489D9@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org On Apr 14, 2009, at 5:17 PM, Grant Ingersoll wrote: > I would be concerned about the fact that EMR is using 0.18 and > Mahout is on 0.19 (which of course raises another concern expressed > by Owen O'Malley to me at ApacheCon: No one uses 0.19) Well, I did run Mahout locally on a 0.18.3 install, but that was writing to and reading from HDFS. I can build a custom mahout- examples that has the 0.18.3 Hadoop jars (or perhaps no hadoop jar at all...) I'm guessing if EMR is on 0.18.3 and it gets popular, then you're going to have to deal with that problem. > I'd say you should try reproducing the problem on the same version > that Mahout uses. That'll be a bit tricky in the EMR case as that's Amazon's business (ask me about trying to get a 64bit Solaris AMI on Amazon's version of Xen...) > > FWIW, any committer on the Mahout project can likely get credits to > use AWS. I'm happy to share my limited experience. Also: >> ----- Original Message ---- >>> From: Sean Owen >>> To: mahout-user@lucene.apache.org >>> Sent: Tuesday, April 14, 2009 4:19:51 PM >>> Subject: Re: Mahout on Elastic MapReduce >>> >>> This is a fairly uninformed observation, but: the error seems to be >>> from Hadoop. It seems to say that it understands hdfs:, but not >>> s3n:, >>> and that makes sense to me. Do we expect Hadoop understands how to >>> read from S3? I would expect not. (Though, you point to examples >>> that >>> seem to overcome this just fine?) As Otis pointed out, Hadoop can handle S3 a couple of ways, and the example that I've been working seems to be able to read the input data from an s3n URI no problem. >>> When I have integrated code with stuff stored on S3, I have always >>> had >>> to write extra glue code to copy from S3 to a local file system, do >>> work, then copy back. I think you do need to copy from S3 to HDFS, but I think that happens automagically (? My Hadoop ignorance is starting to show!) Steve -- Stephen Green // Stephen.Green@sun.com Principal Investigator \\ http://blogs.sun.com/searchguy Aura Project // Voice: +1 781-442-0926 Sun Microsystems Labs \\ Fax: +1 781-442-1692