Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C38E6CDF6 for ; Mon, 7 May 2012 14:37:47 +0000 (UTC) Received: (qmail 45109 invoked by uid 500); 7 May 2012 14:37:44 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 44989 invoked by uid 500); 7 May 2012 14:37:44 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 44980 invoked by uid 99); 7 May 2012 14:37:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2012 14:37:43 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of afaris@linkedin.com designates 69.28.149.81 as permitted sender) Received: from [69.28.149.81] (HELO esv4-mav05.corp.linkedin.com) (69.28.149.81) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2012 14:37:39 +0000 DomainKey-Signature: s=prod; d=linkedin.com; c=nofws; q=dns; h=X-IronPort-AV:Received:From:To:Subject:Thread-Topic: Thread-Index:Date:Message-ID:References:In-Reply-To: Accept-Language:Content-Language:X-MS-Has-Attach: X-MS-TNEF-Correlator:x-originating-ip:Content-Type: Content-ID:Content-Transfer-Encoding:MIME-Version; b=oHO7nD8HxaVvnLYX82TK9Yx36gz1+lfLBQzLeL8sGxO8dfE9q6dpwYw5 2ZY/j7qeM12kLA0OrCCtE5ByTFICxSZ6lnYazChrVwDd67kWZQexJJPG3 f+ktlP9MIg/dhh/; DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linkedin.com; i=@linkedin.com; q=dns/txt; s=proddkim; t=1336401459; x=1367937459; h=from:to:subject:date:message-id:references:in-reply-to: content-id:content-transfer-encoding:mime-version; bh=IOjEldMR9+fQeNNZ0ID2seYZz+Ia1IvcKaQxVFKymZA=; b=ehqWIDiJTKk0QCIDZ9vz92t5aPpVVYKqVu7rTR9l0PZ4NXfm0gC3m2H7 sx3OAed8t1lT2hsKjg/qJ4JkprPFwulTLedd8m5PcgBxBLbt9EUtSvoWB sRvyBhQlzjzL7W1; X-IronPort-AV: E=Sophos;i="4.75,544,1330934400"; d="scan'208";a="14190125" Received: from ESV4-EXC01.linkedin.biz ([fe80::d7c:dc04:aea1:97d7]) by esv4-cas02.linkedin.biz ([172.18.46.142]) with mapi id 14.01.0355.002; Mon, 7 May 2012 07:37:01 -0700 From: Adam Faris To: "" Subject: Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3 Thread-Topic: Best practice to migrate HDFS from 0.20.205 to CDH3u3 Thread-Index: AQHNKPOY1YsIMTpebkSQUkKdDRVyNpa4FmoAgAAHz4CAACdIgIAAAowAgAABFQCAAAhMAIAABGgAgAABsYCAAAraAIAAQueAgAARHoCAAG4AgIAFdu8AgAANL4CAAARdAIAANGSA Date: Mon, 7 May 2012 14:37:00 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.18.46.247] Content-Type: text/plain; charset="us-ascii" Content-ID: <8672244A730B814C9008F6938886E645@linkedin.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Hi Austin, I don't know about using CDH3, but we use distcp for moving data between di= fferent versions of apache grids and several things come to mind. 1) you should use the -i flag to ignore checksum differences on the blocks.= I'm not 100% but want to say hftp doesn't support checksums on the blocks= as they go across the wire. 2) you should read from hftp but write to hdfs. Also make sure to check yo= ur port numbers. For example I can read from hftp on port 50070 and write= to hdfs on port 9000. You'll find the hftp port in hdfs-site.xml and hdfs= in core-site.xml on apache releases. 3) Do you have security (kerberos) enabled on 0.20.205? Does CDH3 support s= ecurity? If security is enabled on 0.20.205 and CDH3 does not support secu= rity, you will need to disable security on 0.20.205. This is because you a= re unable to write from a secure to unsecured grid. 4) use the -m flag to limit your mappers so you don't DDOS your network bac= kbone. =20 5) why isn't your vender helping you with the data migration? :) =20 Otherwise something like this should get you going. hadoop -i -ppgu -log /tmp/mylog -m 20 distcp hftp://mynamenode.grid.one:500= 70/path/to/my/src/data hdfs://mynamenode.grid.two:9000/path/to/my/dst=20 -- Adam On May 7, 2012, at 4:29 AM, Nitin Pawar wrote: > things to check >=20 > 1) when you launch distcp jobs all the datanodes of older hdfs are live a= nd > connected > 2) when you launch distcp no data is being written/moved/deleteed in hdfs > 3) you can use option -log to log errors into directory and user -i to > ignore errors >=20 > also u can try using distcp with hdfs protocol instead of hftp ... for > more you can refer > https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thr= ead/d0d99ad9f1554edd >=20 >=20 >=20 > if it failed there should be some error > On Mon, May 7, 2012 at 4:44 PM, Austin Chungath wrot= e: >=20 >> ok that was a lame mistake. >> $ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_co= py >> I had spelled hdfs instead of "hftp" >>=20 >> $ hadoop distcp hftp://localhost:50070/docs/index.html >> hftp://localhost:60070/user/hadoop >> 12/05/07 16:38:09 INFO tools.DistCp: >> srcPaths=3D[hftp://localhost:50070/docs/index.html] >> 12/05/07 16:38:09 INFO tools.DistCp: >> destPath=3Dhftp://localhost:60070/user/hadoop >> With failures, global counters are inaccurate; consider running with -i >> Copy failed: java.io.IOException: Not supported >> at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457) >> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963) >> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672) >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) >>=20 >> Any idea why this error is coming? >> I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3 >> (/user/hadoop) >>=20 >> Thanks & Regards, >> Austin >>=20 >> On Mon, May 7, 2012 at 3:57 PM, Austin Chungath >> wrote: >>=20 >>> Thanks, >>>=20 >>> So I decided to try and move using distcp. >>>=20 >>> $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_co= py >>> 12/05/07 14:57:38 INFO tools.DistCp: >> srcPaths=3D[hdfs://localhost:54310/tmp] >>> 12/05/07 14:57:38 INFO tools.DistCp: >>> destPath=3Dhdfs://localhost:8021/tmp_copy >>> With failures, global counters are inaccurate; consider running with -i >>> Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol >>> org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (clien= t >> =3D >>> 63, server =3D 61) >>>=20 >>> I found that we can do distcp like above only if both are of the same >>> hadoop version. >>> so I tried: >>>=20 >>> $ hadoop distcp hftp://localhost:50070/tmp >> hdfs://localhost:60070/tmp_copy >>> 12/05/07 15:02:44 INFO tools.DistCp: >> srcPaths=3D[hftp://localhost:50070/tmp] >>> 12/05/07 15:02:44 INFO tools.DistCp: >>> destPath=3Dhdfs://localhost:60070/tmp_copy >>>=20 >>> But this process seemed to be hangs at this stage. What might I be doin= g >>> wrong? >>>=20 >>> hftp:/// >>> hftp://localhost:50070 is dfs.http.address of 0.20.205 >>> hdfs://localhost:60070 is dfs.http.address of cdh3u3 >>>=20 >>> Thanks and regards, >>> Austin >>>=20 >>>=20 >>> On Fri, May 4, 2012 at 4:30 AM, Michel Segel >> wrote: >>>=20 >>>> Ok... So riddle me this... >>>> I currently have a replication factor of 3. >>>> I reset it to two. >>>>=20 >>>> What do you have to do to get the replication factor of 3 down to 2? >>>> Do I just try to rebalance the nodes? >>>>=20 >>>> The point is that you are looking at a very small cluster. >>>> You may want to start the be cluster with a replication factor of 2 an= d >>>> then when the data is moved over, increase it to a factor of 3. Or may= be >>>> not. >>>>=20 >>>> I do a distcp to. Copy the data and after each distcp, I do an fsck fo= r >> a >>>> sanity check and then remove the files I copied. As I gain more room, = I >> can >>>> then slowly drop nodes, do an fsck, rebalance and then repeat. >>>>=20 >>>> Even though this us a dev cluster, the OP wants to retain the data. >>>>=20 >>>> There are other options depending on the amount and size of new >> hardware. >>>> I mean make one machine a RAID 5 machine, copy data to it clearing off >>>> the cluster. >>>>=20 >>>> If 8TB was the amount of disk used, that would be 2.6666 TB used. >>>> Let's say 3TB. Going raid 5, how much disk is that? So you could fit = it >>>> on one machine, depending on hardware, or maybe 2 machines... Now you >> can >>>> rebuild initial cluster and then move data back. Then rebuild those >>>> machines. Lots of options... ;-) >>>>=20 >>>> Sent from a remote device. Please excuse any typos... >>>>=20 >>>> Mike Segel >>>>=20 >>>> On May 3, 2012, at 11:26 AM, Suresh Srinivas >>>> wrote: >>>>=20 >>>>> This probably is a more relevant question in CDH mailing lists. That >>>> said, >>>>> what Edward is suggesting seems reasonable. Reduce replication factor= , >>>>> decommission some of the nodes and create a new cluster with those >> nodes >>>>> and do distcp. >>>>>=20 >>>>> Could you share with us the reasons you want to migrate from Apache >> 205? >>>>>=20 >>>>> Regards, >>>>> Suresh >>>>>=20 >>>>> On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo < >> edlinuxguru@gmail.com >>>>> wrote: >>>>>=20 >>>>>> Honestly that is a hassle, going from 205 to cdh3u3 is probably more >>>>>> or a cross-grade then an upgrade or downgrade. I would just stick it >>>>>> out. But yes like Michael said two clusters on the same gear and >>>>>> distcp. If you are using RF=3D3 you could also lower your replicatio= n >> to >>>>>> rf=3D2 'hadoop dfs -setrepl 2' to clear headroom as you are moving >>>>>> stuff. >>>>>>=20 >>>>>>=20 >>>>>> On Thu, May 3, 2012 at 7:25 AM, Michel Segel < >>>> michael_segel@hotmail.com> >>>>>> wrote: >>>>>>> Ok... When you get your new hardware... >>>>>>>=20 >>>>>>> Set up one server as your new NN, JT, SN. >>>>>>> Set up the others as a DN. >>>>>>> (Cloudera CDH3u3) >>>>>>>=20 >>>>>>> On your existing cluster... >>>>>>> Remove your old log files, temp files on HDFS anything you don't >> need. >>>>>>> This should give you some more space. >>>>>>> Start copying some of the directories/files to the new cluster. >>>>>>> As you gain space, decommission a node, rebalance, add node to new >>>>>> cluster... >>>>>>>=20 >>>>>>> It's a slow process. >>>>>>>=20 >>>>>>> Should I remind you to make sure you up you bandwidth setting, and >> to >>>>>> clean up the hdfs directories when you repurpose the nodes? >>>>>>>=20 >>>>>>> Does this make sense? >>>>>>>=20 >>>>>>> Sent from a remote device. Please excuse any typos... >>>>>>>=20 >>>>>>> Mike Segel >>>>>>>=20 >>>>>>> On May 3, 2012, at 5:46 AM, Austin Chungath >>>> wrote: >>>>>>>=20 >>>>>>>> Yeah I know :-) >>>>>>>> and this is not a production cluster ;-) and yes there is more >>>> hardware >>>>>>>> coming :-) >>>>>>>>=20 >>>>>>>> On Thu, May 3, 2012 at 4:10 PM, Michel Segel < >>>> michael_segel@hotmail.com >>>>>>> wrote: >>>>>>>>=20 >>>>>>>>> Well, you've kind of painted yourself in to a corner... >>>>>>>>> Not sure why you didn't get a response from the Cloudera lists, >> but >>>>>> it's a >>>>>>>>> generic question... >>>>>>>>>=20 >>>>>>>>> 8 out of 10 TB. Are you talking effective storage or actual disks= ? >>>>>>>>> And please tell me you've already ordered more hardware.. Right? >>>>>>>>>=20 >>>>>>>>> And please tell me this isn't your production cluster... >>>>>>>>>=20 >>>>>>>>> (Strong hint to Strata and Cloudea... You really want to accept m= y >>>>>>>>> upcoming proposal talk... ;-) >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> Sent from a remote device. Please excuse any typos... >>>>>>>>>=20 >>>>>>>>> Mike Segel >>>>>>>>>=20 >>>>>>>>> On May 3, 2012, at 5:25 AM, Austin Chungath >>>>>> wrote: >>>>>>>>>=20 >>>>>>>>>> Yes. This was first posted on the cloudera mailing list. There >>>> were no >>>>>>>>>> responses. >>>>>>>>>>=20 >>>>>>>>>> But this is not related to cloudera as such. >>>>>>>>>>=20 >>>>>>>>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in >>>> apache >>>>>>>>>> hadoop 0.20.205 >>>>>>>>>>=20 >>>>>>>>>> There is an upgrade namenode option when we are migrating to a >>>> higher >>>>>>>>>> version say from 0.20 to 0.20.205 >>>>>>>>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3) >>>>>>>>>> Is this possible? >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi < >>>>>> prash1784@gmail.com >>>>>>>>>> wrote: >>>>>>>>>>=20 >>>>>>>>>>> Seems like a matter of upgrade. I am not a Cloudera user so >> would >>>> not >>>>>>>>> know >>>>>>>>>>> much, but you might find some help moving this to Cloudera >> mailing >>>>>> list. >>>>>>>>>>>=20 >>>>>>>>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath < >>>> austincv@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>=20 >>>>>>>>>>>> There is only one cluster. I am not copying between clusters. >>>>>>>>>>>>=20 >>>>>>>>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storag= e >>>>>>>>> capacity >>>>>>>>>>>> and has about 8 TB of data. >>>>>>>>>>>> Now how can I migrate the same cluster to use cdh3 and use tha= t >>>>>> same 8 >>>>>>>>> TB >>>>>>>>>>>> of data. >>>>>>>>>>>>=20 >>>>>>>>>>>> I can't copy 8 TB of data using distcp because I have only 2 T= B >>>> of >>>>>> free >>>>>>>>>>>> space >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar < >>>>>> nitinpawar432@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>=20 >>>>>>>>>>>>> you can actually look at the distcp >>>>>>>>>>>>>=20 >>>>>>>>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html >>>>>>>>>>>>>=20 >>>>>>>>>>>>> but this means that you have two different set of clusters >>>>>> available >>>>>>>>> to >>>>>>>>>>>> do >>>>>>>>>>>>> the migration >>>>>>>>>>>>>=20 >>>>>>>>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath < >>>>>> austincv@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>=20 >>>>>>>>>>>>>> Thanks for the suggestions, >>>>>>>>>>>>>> My concerns are that I can't actually copyToLocal from the >> dfs >>>>>>>>>>> because >>>>>>>>>>>>> the >>>>>>>>>>>>>> data is huge. >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I >> can >>>> do >>>>>> a >>>>>>>>>>>>>> namenode upgrade. I don't have to copy data out of dfs. >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> But here I am having Apache hadoop 0.20.205 and I want to us= e >>>> CDH3 >>>>>>>>>>> now, >>>>>>>>>>>>>> which is based on 0.20 >>>>>>>>>>>>>> Now it is actually a downgrade as 0.20.205's namenode info >> has >>>> to >>>>>> be >>>>>>>>>>>> used >>>>>>>>>>>>>> by 0.20's namenode. >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> Any idea how I can achieve what I am trying to do? >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar < >>>>>>>>>>> nitinpawar432@gmail.com >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> i can think of following options >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> 1) write a simple get and put code which gets the data from >>>> DFS >>>>>> and >>>>>>>>>>>>> loads >>>>>>>>>>>>>>> it in dfs >>>>>>>>>>>>>>> 2) see if the distcp between both versions are compatible >>>>>>>>>>>>>>> 3) this is what I had done (and my data was hardly few >> hundred >>>>>> GB) >>>>>>>>>>> .. >>>>>>>>>>>>>> did a >>>>>>>>>>>>>>> dfs -copyToLocal and then in the new grid did a >> copyFromLocal >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath < >>>>>>>>>>> austincv@gmail.com >>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3. >>>>>>>>>>>>>>>> I don't want to lose the data that is in the HDFS of Apach= e >>>>>>>>>>> hadoop >>>>>>>>>>>>>>>> 0.20.205. >>>>>>>>>>>>>>>> How do I migrate to CDH3u3 but keep the data that I have o= n >>>>>>>>>>>> 0.20.205. >>>>>>>>>>>>>>>> What is the best practice/ techniques to do this? >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> Thanks & Regards, >>>>>>>>>>>>>>>> Austin >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Nitin Pawar >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nitin Pawar >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>=20 >>>>=20 >>>=20 >>>=20 >>=20 >=20 >=20 >=20 > --=20 > Nitin Pawar