Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8D6B511A53 for ; Sat, 20 Sep 2014 15:06:40 +0000 (UTC) Received: (qmail 80448 invoked by uid 500); 20 Sep 2014 15:06:34 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 80255 invoked by uid 500); 20 Sep 2014 15:06:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 79357 invoked by uid 99); 20 Sep 2014 15:06:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Sep 2014 15:06:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mohajeri@gmail.com designates 209.85.214.181 as permitted sender) Received: from [209.85.214.181] (HELO mail-ob0-f181.google.com) (209.85.214.181) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Sep 2014 15:06:30 +0000 Received: by mail-ob0-f181.google.com with SMTP id wo20so2540712obc.26 for ; Sat, 20 Sep 2014 08:06:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=qYJ/Gv7QG8YbHcSJbgIWzDZYmOzdaCAAHXsX6I+BZ4I=; b=t50d62vI2Cghfjb9QF1n5sJb0RQrlRHi9aqneNgILtIZ3BPQGEH6C84AWe5UIt1O1G ZMV2837iC7i3cB2ko9drPSt4OE8qHWj8YcePlxNNlkG7RK7TyRPEwaUwzvn3iJqgUzoS /PNl9uyv0GJ1wcpUiRPUa8ohfsO6uezF++PnyyzZMfzfo2NKmWzkknPu51EhITXWg6Kb MQuKHeJ9jHP/1zx61qXDqko8pNbO4lbUK9YouuEFfXDUrEdmk+DM2nXSSteaXuiCq2Fe 7QBUwFCIFT19IGXnYYAYys1y+CyZue91I3a4TkhwPC/3VfBUxgZbrdnca47U57mFyp5+ uNgw== MIME-Version: 1.0 X-Received: by 10.60.42.178 with SMTP id p18mr8224227oel.15.1411225569368; Sat, 20 Sep 2014 08:06:09 -0700 (PDT) Received: by 10.76.12.67 with HTTP; Sat, 20 Sep 2014 08:06:09 -0700 (PDT) In-Reply-To: References: Date: Sat, 20 Sep 2014 11:06:09 -0400 Message-ID: Subject: Re: Unable to use transfer data using distcp between EC2-classic cluster and VPC cluster From: Peyman Mohajerian To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c201c6c2a03e0503808ef1 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c201c6c2a03e0503808ef1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable It maybe easier to copy the data to s3 and then from s3 to the new cluster. On Fri, Sep 19, 2014 at 8:45 PM, Jameel Al-Aziz wrote: > Hi all, > > We=E2=80=99re in the process of migrating from EC2-Classic to VPC and ne= eded to > transfer our HDFS data. We setup a new cluster inside the VPC, and assign= ed > the name node and data node temporary public IPs. Initially, we had a lot > of trouble getting the name node to redirect to the public hostname inste= ad > of private IPs. After some fiddling around, we finally got webhdfs and df= s > -cp to work using public hostnames. However, distcp simply refuses to use > the public hostnames when connecting to the data nodes. > > We=E2=80=99re running distcp on the old cluster, copying data into the n= ew > cluster. > > The old hadoop cluster is running 1.0.4 and the new one is running 1.2.1= . > > So far, on the new cluster, we=E2=80=99ve tried: > - Using public DNS hostnames in the master and slaves files (on both the > name node and data nodes) > - Setting the hostname of all the boxes to their public DNS name > - Setting =E2=80=9Cfs.default.name=E2=80=9D to the public DNS name of th= e new name node. > > And on both clusters: > - Setting the =E2=80=9Cdfs.datanode.use.datanode.hostname=E2=80=9D and > =E2=80=9Cdfs.client.use.datanode.hostname=E2=80=9D to =E2=80=9Ctrue" on b= oth the old and new > cluster. > > Even though webhdfs is finally redirecting to data nodes using the > public hostname, we keep seeing errors when running distcp. The errors ar= e > all similar to: http://pastebin.com/ZYR07Fvm > > What do we need to do to get distcp to use the public hostname of the > new machines? I haven=E2=80=99t tried running distcp in the other directi= on (I=E2=80=99m > about to), but I suspect I=E2=80=99ll run into the same problem. > > Thanks! > Jameel > --001a11c201c6c2a03e0503808ef1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
It maybe easier to copy the data to s3 and then from s3 to= the new cluster.

On Fri, Sep 19, 2014 at 8:45 PM, Jameel Al-Aziz <= ;jameel@6sense.com> wrote:
Hi all,

We=E2=80=99re in the process of migrating from EC2-Classic to VPC and neede= d to transfer our HDFS data. We setup a new cluster inside the VPC, and ass= igned the name node and data node temporary public IPs. Initially, we had a= lot of trouble getting the name node to redirect to the public hostname instead of private IPs. After some fiddlin= g around, we finally got webhdfs and dfs -cp to work using public hostnames= . However, distcp simply refuses to use the public hostnames when connectin= g to the data nodes.

We=E2=80=99re running distcp on the old cluster, copying data into the new = cluster.

The old hadoop cluster is running 1.0.4 and the new one is running 1.2.1.

So far, on the new cluster, we=E2=80=99ve tried:
- Using public DNS hostnames in the master and slaves files (on both the na= me node and data nodes)
- Setting the hostname of all the boxes to their public DNS name
- Setting =E2=80=9Cfs.= default.name=E2=80=9D to the public DNS name of the new name node.

And on both clusters:
- Setting the =E2=80=9Cdfs.datanode.use.datanode.hostname=E2=80=9D and =E2= =80=9Cdfs.client.use.datanode.hostname=E2=80=9D to =E2=80=9Ctrue" on b= oth the old and new cluster.

Even though webhdfs is finally redirecting to data nodes using the public h= ostname, we keep seeing errors when running distcp. The errors are all simi= lar to:=C2=A0htt= p://pastebin.com/ZYR07Fvm

What do we need to do to get distcp to use the public hostname of the new m= achines? I haven=E2=80=99t tried running distcp in the other direction (I= =E2=80=99m about to), but I suspect I=E2=80=99ll run into the same problem.=

Thanks!
Jameel

--001a11c201c6c2a03e0503808ef1--