hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Srinivas <sur...@hortonworks.com>
Subject Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3
Date Thu, 03 May 2012 16:26:23 GMT
This probably is a more relevant question in CDH mailing lists. That said,
what Edward is suggesting seems reasonable. Reduce replication factor,
decommission some of the nodes and create a new cluster with those nodes
and do distcp.

Could you share with us the reasons you want to migrate from Apache 205?

Regards,
Suresh

On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> Honestly that is a hassle, going from 205 to cdh3u3 is probably more
> or a cross-grade then an upgrade or downgrade. I would just stick it
> out. But yes like Michael said two clusters on the same gear and
> distcp. If you are using RF=3 you could also lower your replication to
> rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving
> stuff.
>
>
> On Thu, May 3, 2012 at 7:25 AM, Michel Segel <michael_segel@hotmail.com>
> wrote:
> > Ok... When you get your new hardware...
> >
> > Set up one server as your new NN, JT, SN.
> > Set up the others as a DN.
> > (Cloudera CDH3u3)
> >
> > On your existing cluster...
> > Remove your old log files, temp files on HDFS anything you don't need.
> > This should give you some more space.
> > Start copying some of the directories/files to the new cluster.
> > As you gain space, decommission a node, rebalance, add node to new
> cluster...
> >
> > It's a slow process.
> >
> > Should I remind you to make sure you up you bandwidth setting, and to
> clean up the hdfs directories when you repurpose the nodes?
> >
> > Does this make sense?
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> > On May 3, 2012, at 5:46 AM, Austin Chungath <austincv@gmail.com> wrote:
> >
> >> Yeah I know :-)
> >> and this is not a production cluster ;-) and yes there is more hardware
> >> coming :-)
> >>
> >> On Thu, May 3, 2012 at 4:10 PM, Michel Segel <michael_segel@hotmail.com
> >wrote:
> >>
> >>> Well, you've kind of painted yourself in to a corner...
> >>> Not sure why you didn't get a response from the Cloudera lists, but
> it's a
> >>> generic question...
> >>>
> >>> 8 out of 10 TB. Are you talking effective storage or actual disks?
> >>> And please tell me you've already ordered more hardware.. Right?
> >>>
> >>> And please tell me this isn't your production cluster...
> >>>
> >>> (Strong hint to Strata and Cloudea... You really want to accept my
> >>> upcoming proposal talk... ;-)
> >>>
> >>>
> >>> Sent from a remote device. Please excuse any typos...
> >>>
> >>> Mike Segel
> >>>
> >>> On May 3, 2012, at 5:25 AM, Austin Chungath <austincv@gmail.com>
> wrote:
> >>>
> >>>> Yes. This was first posted on the cloudera mailing list. There were
no
> >>>> responses.
> >>>>
> >>>> But this is not related to cloudera as such.
> >>>>
> >>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in apache
> >>>> hadoop 0.20.205
> >>>>
> >>>> There is an upgrade namenode option when we are migrating to a higher
> >>>> version say from 0.20 to 0.20.205
> >>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3)
> >>>> Is this possible?
> >>>>
> >>>>
> >>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi <
> prash1784@gmail.com
> >>>> wrote:
> >>>>
> >>>>> Seems like a matter of upgrade. I am not a Cloudera user so would
not
> >>> know
> >>>>> much, but you might find some help moving this to Cloudera mailing
> list.
> >>>>>
> >>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <austincv@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> There is only one cluster. I am not copying between clusters.
> >>>>>>
> >>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage
> >>> capacity
> >>>>>> and has about 8 TB of data.
> >>>>>> Now how can I migrate the same cluster to use cdh3 and use that
> same 8
> >>> TB
> >>>>>> of data.
> >>>>>>
> >>>>>> I can't copy 8 TB of data using distcp because I have only 2
TB of
> free
> >>>>>> space
> >>>>>>
> >>>>>>
> >>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <
> nitinpawar432@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> you can actually look at the distcp
> >>>>>>>
> >>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html
> >>>>>>>
> >>>>>>> but this means that you have two different set of clusters
> available
> >>> to
> >>>>>> do
> >>>>>>> the migration
> >>>>>>>
> >>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <
> austincv@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for the suggestions,
> >>>>>>>> My concerns are that I can't actually copyToLocal from
the dfs
> >>>>> because
> >>>>>>> the
> >>>>>>>> data is huge.
> >>>>>>>>
> >>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205
I can do
> a
> >>>>>>>> namenode upgrade. I don't have to copy data out of dfs.
> >>>>>>>>
> >>>>>>>> But here I am having Apache hadoop 0.20.205 and I want
to use CDH3
> >>>>> now,
> >>>>>>>> which is based on 0.20
> >>>>>>>> Now it is actually a downgrade as 0.20.205's namenode
info has to
> be
> >>>>>> used
> >>>>>>>> by 0.20's namenode.
> >>>>>>>>
> >>>>>>>> Any idea how I can achieve what I am trying to do?
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar <
> >>>>> nitinpawar432@gmail.com
> >>>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> i can think of following options
> >>>>>>>>>
> >>>>>>>>> 1) write a simple get and put code which gets the
data from DFS
> and
> >>>>>>> loads
> >>>>>>>>> it in dfs
> >>>>>>>>> 2) see if the distcp  between both versions are
compatible
> >>>>>>>>> 3) this is what I had done (and my data was hardly
few hundred
> GB)
> >>>>> ..
> >>>>>>>> did a
> >>>>>>>>> dfs -copyToLocal and then in the new grid did a
copyFromLocal
> >>>>>>>>>
> >>>>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath
<
> >>>>> austincv@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>> I am migrating from Apache hadoop 0.20.205 to
CDH3u3.
> >>>>>>>>>> I don't want to lose the data that is in the
HDFS of Apache
> >>>>> hadoop
> >>>>>>>>>> 0.20.205.
> >>>>>>>>>> How do I migrate to CDH3u3 but keep the data
that I have on
> >>>>>> 0.20.205.
> >>>>>>>>>> What is the best practice/ techniques to do
this?
> >>>>>>>>>>
> >>>>>>>>>> Thanks & Regards,
> >>>>>>>>>> Austin
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Nitin Pawar
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Nitin Pawar
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message