Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 99477 invoked from network); 3 Jan 2007 05:45:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Jan 2007 05:45:38 -0000 Received: (qmail 60325 invoked by uid 500); 3 Jan 2007 05:45:44 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 60297 invoked by uid 500); 3 Jan 2007 05:45:44 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 60288 invoked by uid 99); 3 Jan 2007 05:45:43 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Jan 2007 21:45:43 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [63.203.238.117] (HELO dns.duboce.net) (63.203.238.117) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Jan 2007 21:45:34 -0800 Received: by dns.duboce.net (Postfix, from userid 1008) id D9345C51B; Tue, 2 Jan 2007 20:25:04 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-26) on dns.duboce.net X-Spam-Level: Received: from [192.168.1.107] (unknown [192.168.1.107]) by dns.duboce.net (Postfix) with ESMTP id EC040C256; Tue, 2 Jan 2007 20:25:02 -0800 (PST) Message-ID: <459B432C.6060103@archive.org> Date: Tue, 02 Jan 2007 21:46:20 -0800 From: Michael Stack User-Agent: Thunderbird 1.5.0.8 (X11/20061115) MIME-Version: 1.0 To: hadoop-user@lucene.apache.org Subject: s3 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-4.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.4 I'm trying to use the s3 filesystem that was recently added to hadoop TRUNK. If I set fs.default.name to be s3://AWS_IDENTIFIER:AWS_SECRET@MY_BUCKET/ so I can run mapreduce jobs that get and set directly from S3, I get the following complaint: java.io.IOException: Cannot create file /mapred/system/submit_86vwi0/job.jar since parent directory /mapred/system/submit_86vwi0 does not exist. While '/mapred/system/' exists, the temporary job directory 'submit_86vwi0' is not being created. This looks like a bug. How are others making use of the S3 filesystem currently? Are ye writing maps/reduces that explicitly get an S3 filesystem for putting and getting of S3 inputs/outputs? What I really want is a mapreduce tool to do bulk copies of HDFS outputs to S3 and back again. I made a start on modifying the CopyFiles tool (distcp) adding to the mapper factory an S3 mapper to complement the already existing HDFS and HTTP implementations but before I go any further, perhaps this has been done already? Thanks for any feedback, St.Ack