Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of jameel@6sense.com designates
 157.56.111.57 as permitted sender)
From: Jameel Al-Aziz <jameel@6sense.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: Re: Unable to use transfer data using distcp between EC2-classic
 cluster and VPC cluster
Thread-Topic: Unable to use transfer data using distcp between EC2-classic
 cluster and VPC cluster
Thread-Index: AQHP1Gw9OPCY94NB2kCr8iK51VZ0gJwKH/iAgAAFIICAAFA2oQ==
Date: Sat, 20 Sep 2014 20:11:34 +0000
Message-ID: <481e80d5-e45f-464a-b991-a9d5ec955cf9@6sense.com>
References: <etPan.541cce43.46e87ccd.16eab@uruk>
	<CAO6JcpjTJSyABVH9crmJkd_YtfE8hE-w0_L0NySuW9U5jHnWRw@mail.gmail.com>,<CANDVwqhvVVV0LKFRcvG3y2nZ80U4uctqqQjUJh5WVdm6HGtx3Q@mail.gmail.com>
In-Reply-To: 
 <CANDVwqhvVVV0LKFRcvG3y2nZ80U4uctqqQjUJh5WVdm6HGtx3Q@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_481e80d5e45f464ab991a9d5ec955cf96sensecom_"
MIME-Version: 1.0

--_000_481e80d5e45f464ab991a9d5ec955cf96sensecom_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi Ankit,

We originally tried to copy to S3 and back. In fact, it is actually our fal=
lback plan. We were having issues with the copy to S3 not maintaining the d=
irectory layout, so we decided to try and do a direct copy.

I'll give it another shot though!

Jameel Al-Aziz

From: Ankit Singhal <ankitsinghal59@gmail.com>
Sent: Sep 20, 2014 8:25 AM
To: user@hadoop.apache.org
Subject: Re: Unable to use transfer data using distcp between EC2-classic c=
luster and VPC cluster

Hi Jameel,

As Peyman said, best approach is to do distcp from your old cluster to s3 a=
nd making MR job reading directly from s3 on new cluster.

but If you still need to do distcp from hdfs to hdfs then update /etc/hosts=
 or DNS of all the nodes of your old cluster with "publicIp   internalAWSDN=
SName" of all nodes of new cluster.
for eq:-
/etc/hosts of all nodes of old cluster should have entry of all the nodes o=
f new cluster in below format.
54.xxx.xxx.xx1   ip-10-xxx-xxx-xx1.ec2.internal
54.xxx.xxx.xx2   ip-10-xxx-xxx-xx2.ec2.internal
54.xxx.xxx.xx3   ip-10-xxx-xxx-xx3.ec2.internal

Regards,
Ankit Singhal

On Sat, Sep 20, 2014 at 8:36 PM, Peyman Mohajerian <mohajeri@gmail.com<mail=
to:mohajeri@gmail.com>> wrote:
It maybe easier to copy the data to s3 and then from s3 to the new cluster.

On Fri, Sep 19, 2014 at 8:45 PM, Jameel Al-Aziz <jameel@6sense.com<mailto:j=
ameel@6sense.com>> wrote:
Hi all,

We're in the process of migrating from EC2-Classic to VPC and needed to tra=
nsfer our HDFS data. We setup a new cluster inside the VPC, and assigned th=
e name node and data node temporary public IPs. Initially, we had a lot of =
trouble getting the name node to redirect to the public hostname instead of=
 private IPs. After some fiddling around, we finally got webhdfs and dfs -c=
p to work using public hostnames. However, distcp simply refuses to use the=
 public hostnames when connecting to the data nodes.

We're running distcp on the old cluster, copying data into the new cluster.

The old hadoop cluster is running 1.0.4 and the new one is running 1.2.1.

So far, on the new cluster, we've tried:
- Using public DNS hostnames in the master and slaves files (on both the na=
me node and data nodes)
- Setting the hostname of all the boxes to their public DNS name
- Setting "fs.default.name<http://fs.default.name>" to the public DNS name =
of the new name node.

And on both clusters:
- Setting the "dfs.datanode.use.datanode.hostname" and "dfs.client.use.data=
node.hostname" to "true" on both the old and new cluster.

Even though webhdfs is finally redirecting to data nodes using the public h=
ostname, we keep seeing errors when running distcp. The errors are all simi=
lar to: http://pastebin.com/ZYR07Fvm

What do we need to do to get distcp to use the public hostname of the new m=
achines? I haven't tried running distcp in the other direction (I'm about t=
o), but I suspect I'll run into the same problem.

Thanks!
Jameel


--_000_481e80d5e45f464ab991a9d5ec955cf96sensecom_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta content=3D"text/html; charset=3Dutf-8">
</head>
<body>
<div style=3D"font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12p=
t">
<div>Hi Ankit,&nbsp;</div>
<div><br>
</div>
<div>We originally tried to copy to S3 and back. In fact, it is actually ou=
r fallback plan. We were having issues with the copy to S3 not maintaining =
the directory layout, so we decided to try and do a direct copy.</div>
<div><br>
</div>
<div>I'll give it another shot though!</div>
<div><br>
</div>
<div id=3D"signature" style=3D"">Jameel Al-Aziz</div>
</div>
<div id=3D"quoted_header"><br>
<div style=3D"border:none; border-top:solid #E1E1E1 1.0pt; padding:3.0pt 0c=
m 0cm 0cm">
<span style=3D"font-size:11.0pt; font-family:'Calibri','sans-serif'"><b>Fro=
m:</b> Ankit Singhal &lt;ankitsinghal59@gmail.com&gt;<br>
<b>Sent:</b> Sep 20, 2014 8:25 AM<br>
<b>To:</b> user@hadoop.apache.org<br>
<b>Subject:</b> Re: Unable to use transfer data using distcp between EC2-cl=
assic cluster and VPC cluster<br>
</span></div>
</div>
<br type=3D"attribution">
<div>
<div dir=3D"ltr">Hi Jameel,
<div><br>
</div>
<div>As&nbsp;<span style=3D"font-family:arial,sans-serif; font-size:13px; w=
hite-space:nowrap">Peyman said, best approach is to do distcp from your old=
 cluster to s3 and making MR job reading directly from s3 on new cluster.</=
span></div>
<div><span style=3D"font-family:arial,sans-serif; font-size:13px; white-spa=
ce:nowrap"><br>
</span></div>
<div><span style=3D"font-family:arial,sans-serif; font-size:13px; white-spa=
ce:nowrap">but If you still need to do distcp from hdfs to hdfs then update=
 /etc/hosts or DNS of all the nodes of your old cluster with &quot;publicIp=
 &nbsp; internalAWSDNSName&quot; of all nodes of
 new cluster.</span></div>
<div><span style=3D"font-family:arial,sans-serif; font-size:13px; white-spa=
ce:nowrap">for eq:-</span></div>
<div><span style=3D"font-family:arial,sans-serif; font-size:13px; white-spa=
ce:nowrap">/etc/hosts of all nodes of old cluster should have entry of all =
the nodes of new cluster in below format.</span></div>
<div><span style=3D"font-family:arial,sans-serif; font-size:13px; white-spa=
ce:nowrap">54.xxx.xxx.xx1 &nbsp;&nbsp;</span><font face=3D"arial, sans-seri=
f"><span style=3D"white-space:nowrap">ip-10-xxx-xxx-xx1.ec2.internal</span>=
</font></div>
<div><span style=3D"font-size:13px; font-family:arial,sans-serif; white-spa=
ce:nowrap">54.xxx.xxx.xx2 &nbsp;&nbsp;</span><font face=3D"arial, sans-seri=
f"><span style=3D"white-space:nowrap">ip-10-xxx-xxx-xx2.ec2.internal</span>=
</font><font face=3D"arial, sans-serif"><span style=3D"white-space:nowrap">=
<br>
</span></font></div>
<div><span style=3D"font-size:13px; font-family:arial,sans-serif; white-spa=
ce:nowrap">54.xxx.xxx.xx3 &nbsp;&nbsp;</span><font face=3D"arial, sans-seri=
f"><span style=3D"white-space:nowrap">ip-10-xxx-xxx-xx3.ec2.internal</span>=
</font><font face=3D"arial, sans-serif"><span style=3D"white-space:nowrap">=
<br>
</span></font></div>
<div><span style=3D"font-family:arial,sans-serif; font-size:13px; white-spa=
ce:nowrap"><br>
</span></div>
<div><span style=3D"font-family:arial,sans-serif; font-size:13px; white-spa=
ce:nowrap">Regards,</span></div>
<div><span style=3D"font-family:arial,sans-serif; font-size:13px; white-spa=
ce:nowrap">Ankit Singhal</span></div>
</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Sat, Sep 20, 2014 at 8:36 PM, Peyman Mohajeri=
an <span dir=3D"ltr">
&lt;<a href=3D"mailto:mohajeri@gmail.com" target=3D"_blank">mohajeri@gmail.=
com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">It maybe easier to copy the data to s3 and then from s3 to=
 the new cluster.</div>
<div class=3D"HOEnZb">
<div class=3D"h5">
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Fri, Sep 19, 2014 at 8:45 PM, Jameel Al-Aziz =
<span dir=3D"ltr">
&lt;<a href=3D"mailto:jameel@6sense.com" target=3D"_blank">jameel@6sense.co=
m</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div style=3D"word-wrap:break-word">
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
Hi all,</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
<br>
</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
We&#8217;re in the process of migrating from EC2-Classic to VPC and needed =
to transfer our HDFS data. We setup a new cluster inside the VPC, and assig=
ned the name node and data node temporary public IPs. Initially, we had a l=
ot of trouble getting the name node to
 redirect to the public hostname instead of private IPs. After some fiddlin=
g around, we finally got webhdfs and dfs -cp to work using public hostnames=
. However, distcp simply refuses to use the public hostnames when connectin=
g to the data nodes.</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
<br>
</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
We&#8217;re running distcp on the old cluster, copying data into the new cl=
uster.</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
<br>
</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
The old hadoop cluster is running 1.0.4 and the new one is running 1.2.1.</=
div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
<br>
</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
So far, on the new cluster, we&#8217;ve tried:</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
- Using public DNS hostnames in the master and slaves files (on both the na=
me node and data nodes)</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
- Setting the hostname of all the boxes to their public DNS name</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
- Setting &#8220;<a href=3D"http://fs.default.name" target=3D"_blank">fs.de=
fault.name</a>&#8221; to the public DNS name of the new name node.</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
<br>
</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
And on both clusters:</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
- Setting the &#8220;dfs.datanode.use.datanode.hostname&#8221; and &#8220;d=
fs.client.use.datanode.hostname&#8221; to &#8220;true&quot; on both the old=
 and new cluster.</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
<br>
</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
Even though webhdfs is finally redirecting to data nodes using the public h=
ostname, we keep seeing errors when running distcp. The errors are all simi=
lar to:&nbsp;<a href=3D"http://pastebin.com/ZYR07Fvm" target=3D"_blank">htt=
p://pastebin.com/ZYR07Fvm</a></div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
<br>
</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
What do we need to do to get distcp to use the public hostname of the new m=
achines? I haven&#8217;t tried running distcp in the other direction (I&#82=
17;m about to), but I suspect I&#8217;ll run into the same problem.</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
<br>
</div>
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
Thanks!</div>
<span><font color=3D"#888888">
<div style=3D"font-family:Helvetica,Arial; font-size:13px; margin:0px; line=
-height:auto">
Jameel</div>
<div>
<div style=3D"font-family:helvetica,arial; font-size:13px">
<div style=3D"font-size:13.333333969116211px; font-family:arial,sans-serif;=
 color:rgb(80,0,80)">
</div>
</div>
</div>
</font></span></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</body>
</html>

--_000_481e80d5e45f464ab991a9d5ec955cf96sensecom_--