Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 20053 invoked from network); 6 Sep 2007 08:33:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Sep 2007 08:33:23 -0000 Received: (qmail 1207 invoked by uid 500); 6 Sep 2007 08:33:17 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 799 invoked by uid 500); 6 Sep 2007 08:33:16 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 790 invoked by uid 99); 6 Sep 2007 08:33:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2007 01:33:16 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tom.e.white@gmail.com designates 64.233.166.179 as permitted sender) Received: from [64.233.166.179] (HELO py-out-1112.google.com) (64.233.166.179) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2007 08:33:10 +0000 Received: by py-out-1112.google.com with SMTP id d32so290121pye for ; Thu, 06 Sep 2007 01:32:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=XDE20SKeCLhMjEI8tzjrkLFqVhW7FDbMjiGCL/o6aEc=; b=k03LgSsvqvYhkrKL0Pr6HMzLCNAQfRrgVKqXG39xAHt2sYi3GBfxGvq8dl4XydzFu2FE23HCuAXlKXBBcpV5gQ/Wk1kANWKmoW5Q23SNWN81TR8V+JzvRN5bER2mCDFRN/432WQ79C8eB+6NUmCZAOODAH+6Jr164JuCDcWcpo8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=F9nJJS4zPEp0yZpMNOMXqhfmh4hPMKr0PWPcHdv2AvLdDZryY60977X0ziKYhHM0oAFJk64UICB4wuSo5QE4AzrDvmBrFJs3aA53Di0JJkk1uKr1fSRPZ9rJZ1M07qAaq/Zj/DIf3UTTjNJOle18dqFTONbtwuc213rqKSfnlPo= Received: by 10.65.214.2 with SMTP id r2mr606054qbq.1189067569593; Thu, 06 Sep 2007 01:32:49 -0700 (PDT) Received: by 10.65.75.20 with HTTP; Thu, 6 Sep 2007 01:32:49 -0700 (PDT) Message-ID: Date: Thu, 6 Sep 2007 09:32:49 +0100 From: "Tom White" To: hadoop-user@lucene.apache.org Subject: Re: Accessing S3 with Hadoop? In-Reply-To: <876ef97a0709051539p6003easbd93aaa1445f51a3@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <876ef97a0709051219o7323cd60u99be07d4896e91c4@mail.gmail.com> <38e0289d0709051336v4f97aefxbc0b2d03980af8bf@mail.gmail.com> <876ef97a0709051539p6003easbd93aaa1445f51a3@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org > Yeah, I actually read all of the wiki and your article about using > Hadoop on EC2/S3 and I can't really find a reference to the S3 support > not being for "regular" S3 keys. Did I miss something or should I > update the wiki to make it more clear (or both)? I don't think this is explained clearly enough, so please do update the wiki. Thanks. > Also, the instructions on the EC2 page on the wiki no longer work, in > that due to the kind of NAT Amazon is using, the slaves can't connect > to the master using an externally-resolved IP address via a DNS name. > What I mean is, if you set DNS to the external IP of your master > instance, your slaves can resolve that address but cannot then connect > to it. So, I had to alter the launch-hadoop-cluster and start-hadoop > scripts and merge them to just pick the master and use its EC2-given > name as the $MASTER_HOST to make it work. This sounds like the problem fixed in https://issues.apache.org/jira/browse/HADOOP-1638 in 0.14.0, which is the version you're using isn't it? Are you able to do 'bin/hadoop-ec2 launch-cluster' then (on your workstation) . bin/hadoop-ec2-env.sh ssh $SSH_OPTS "root@$MASTER_HOST" "sed -i -e \"s/$MASTER_HOST/\$(hostname)/g\" /usr/local/hadoop-$HADOOP_VERSION/conf/hadoop-site.xml" and then check to see if the master host has been set correctly (to the internal IP) in the master host's hadoop-site.xml. Also, what version of the EC2 tools are you using? > I also updated the scripts > to only look for a given AMI ID and only start/manage/terminate > instances of that AMI ID (since I have others I'd rather not > terminated just on the basis of their AMI launch index ;-)). Instances are terminated on the basis of their AMI ID since 0.14.0. See https://issues.apache.org/jira/browse/HADOOP-1504. Tom