Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0365B18DD8 for ; Thu, 24 Mar 2016 13:03:45 +0000 (UTC) Received: (qmail 74218 invoked by uid 500); 24 Mar 2016 13:03:38 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 74118 invoked by uid 500); 24 Mar 2016 13:03:38 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 74108 invoked by uid 99); 24 Mar 2016 13:03:37 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Mar 2016 13:03:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7A7A9C9AD0 for ; Thu, 24 Mar 2016 13:03:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.487 X-Spam-Level: ** X-Spam-Status: No, score=2.487 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, MISSING_HEADERS=1.207, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id xI0UxXWNRc0F for ; Thu, 24 Mar 2016 13:03:35 +0000 (UTC) Received: from mail-vk0-f49.google.com (mail-vk0-f49.google.com [209.85.213.49]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id D7C9E5F256 for ; Thu, 24 Mar 2016 13:03:34 +0000 (UTC) Received: by mail-vk0-f49.google.com with SMTP id e6so56885727vkh.2 for ; Thu, 24 Mar 2016 06:03:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:cc; bh=b7/RrMTve5CdoFMtIUhxuTT1RhD54KnvQd1NXoW7Qps=; b=P/r9MqkDdx3Vp/IReENrnRGzRkxJYfCRiKzbIcFq5OIskSdPFD2JCteg3/HU40OZEw BtBptLakG9vqgVpNGu0XULr13yK22+UxirUiskU/FwGQ4nMA5mIwWon9gXRPfQe+w1zJ G3hOOnMYrFQcXgfvcOjIqkfGlM7Lf6KYy47FPtSAG13P2w6/kdZD3hIHFwKX5NsyDpPM nYe+tLv2S2CjrXGUaOKZbv9SZ7Ck+lgTk47N7e482jbmATha2Zs7NHi0G3NUFrITWbc3 yw+ic/P27tYYVVH6uUBwTzKY19fIsP2jG3hZog2Ws7gBVGypby7PnMBGt/N2aF/nj4iB v85w== X-Gm-Message-State: AD7BkJLwCZm34B1XsJFjghFfFMlwTdVUeH7gvlxlC2RjrX55NFFVZTDMx4ITAf00xYLV0Zbxa9P/km/a1xDkmg== X-Received: by 10.31.52.195 with SMTP id b186mr4000300vka.39.1458824613648; Thu, 24 Mar 2016 06:03:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Marco Reis Date: Thu, 24 Mar 2016 13:03:23 +0000 Message-ID: Subject: unsubscribe Cc: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a1143e542e2f1a6052ecb1241 --001a1143e542e2f1a6052ecb1241 Content-Type: text/plain; charset=UTF-8 On Thu, Mar 24, 2016 at 5:16 AM Chathuri Wimalasena wrote: > Hi Ravi, > > Thank you for all the information, Our application is indexing twitter > data to HBase and then do some data analytics on top of that. That's why > HDFS data is very important to us. We cannot tolerate any data loss with > the update. Do you remember how long it took for you to upgrade it from > 2.4.1 to 2.7.1 ? > > Thanks, > Chathuri > > On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash > wrote: > >> Hi Chathuri! >> >> Technically there is a rollback option during upgrade. I don't know how >> well it has been tested, but the idea is that old metadata is not deleted >> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm >> fairly confident that the HDFS upgrade will work smoothly. We have upgraded >> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never >> having to roll back). Its your applications that work on top of HDFS and >> YARN that I'd be concerned about. >> >> HTH >> Ravi >> >> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena < >> kamalasini@gmail.com> wrote: >> >>> Thanks for information Ravi. Is there a way that I can back up data >>> before the update ? I was thinking about this approach.. >>> >>> Copy the current hadoop directories to a new set of directories. >>> Point hadoop to this new set >>> Start the migration with the backup set >>> >>> Please let me know if people have done this upgrade successfully. I >>> believe many things can go wrong in a lengthy upgrade like this. The data >>> in the cluster is very important. >>> Thanks, >>> Chathuri >>> >>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash >>> wrote: >>> >>>> Hi Chathuri! >>>> >>>> - When we upgrade, does it change the namenode data structures and >>>> data nodes? I assume it only changes the name node... >>>> >>>> It changes the NN as well as DN layout. As a matter of fact, this >>>> upgrade will take a long time on Datanodes as well because of >>>> https://issues.apache.org/jira/browse/HDFS-6482 >>>> >>>> - What are the risks with this upgrade ? >>>> >>>> What Hadoop applications do you run on top of your cluster? The hope is >>>> that everything continues working smoothly for the most part, but >>>> inevitably some backward incompatible changes creep in. >>>> >>>> - Is there a place where I can review the changes made to file >>>> system from 2.5.1 to 2.7.2? >>>> >>>> The release notes. http://hadoop.apache.org/releases.html .You'd have >>>> to accumulate all the changes in the versions. >>>> >>>> Practically, I'd try to run my application on your upgraded test >>>> cluster. >>>> >>>> HTH >>>> >>>> Ravi >>>> >>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena < >>>> kamalasini@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> We have a hadoop production deployment with 1 name node and 10 data >>>>> nodes which has more than 20TB of data in HDFS. We are currently using >>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2. >>>>> >>>>> I followed the following link ( >>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html) >>>>> and updated a single node system running in pseudo distributed mode and it >>>>> went without any issues. But this system did not have that much data as the >>>>> production system. >>>>> >>>>> Since this is a production system, I'm reluctant to do this update. I >>>>> would like to see what other people have done in these cases and their >>>>> experiences... Here are few questions I have.. >>>>> >>>>> - When we upgrade, does it change the namenode data structures and >>>>> data nodes? I assume it only changes the name node... >>>>> - What are the risks with this upgrade ? >>>>> - Is there a place where I can review the changes made to file >>>>> system from 2.5.1 to 2.7.2? >>>>> >>>>> I would really appreciate if you can share your experiences. >>>>> >>>>> Thanks in advance, >>>>> Chathuri >>>>> >>>> >>>> >>> >> > --001a1143e542e2f1a6052ecb1241 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Thu= , Mar 24, 2016 at 5:16 AM Chathuri Wimalasena <kamalasini@gmail.com> wrote:
Hi Ravi,=C2=A0

Thank yo= u for all the information, Our application is indexing twitter data to HBas= e and then do some data analytics on top of that. That's why HDFS data = is very important to us. We cannot tolerate any data loss with the update. = Do you remember how long it took for you to upgrade it from 2.4.1 to 2.7.1 = ?

Thanks,
Chathuri=C2=A0

On Wed, Mar 23, 2016 = at 7:09 PM, Ravi Prakash <ravihadoop@gmail.com> wrote:
Hi Chathuri= !

Technically there is a rollback option during upgrade. I don= 't know how well it has been tested, but the idea is that old metadata = is not deleted until the cluster administrator says $ hdfs dfsadmin -finali= zeUpgrade . I'm fairly confident that the HDFS upgrade will work smooth= ly. We have upgraded quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 succ= essfully (never having to roll back). Its your applications that work on to= p of HDFS and YARN that I'd be concerned about.

HTH<= font color=3D"#888888">
Ravi

On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena= <kamalasini@gmail.com> wrote:
Thanks for information Ravi. Is there a way that I= can back up data before the =C2=A0update ? I was thinking about this appro= ach..

Copy the current hadoop directories to a new set o= f directories.
Point hadoop to this new set
Start the m= igration with the backup set

Please let me know if peopl= e have done this upgrade successfully. I believe many things can go wrong i= n a lengthy upgrade like this. The data in the cluster is very important.= =C2=A0
Thanks,
Chathuri

On Wed, Mar 23, 2016 = at 4:37 PM, Ravi Prakash <ravihadoop@gmail.com> wrote:
Hi Chathuri!
  • When we upgrade, does it change the namenode data structures and d= ata nodes? I assume it only changes the name node...

It = changes the NN as well as DN layout. As a matter of fact, this upgrade will= take a long time on Datanodes as well because of https://issues.apache.= org/jira/browse/HDFS-6482

  • What are the risks with = this upgrade ?

What Hadoop applications do you run = on top of your cluster? The hope is that everything continues working smoot= hly for the most part, but inevitably some backward incompatible changes cr= eep in.

  • Is there a place where I can review the chang= es made to file system from 2.5.1 to 2.7.2?

The release = notes. http://hadoop.apache.org/releases.html .You'd have to accumulate a= ll the changes in the versions.

Practically, I'd try to run = my application on your upgraded test cluster.

HTH

Ravi


On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena = <kamalasini@gmail.com> wrote:
Hi,=C2=A0

We have a hadoop produ= ction deployment with 1 name node and 10 data nodes which has more than 20T= B of data in HDFS. We are currently using Hadoop 2.5.1 and we want to updat= e it to latest Hadoop version, 2.7.2.=C2=A0

I foll= owed the following link (h= ttps://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRo= llingUpgrade.html) and updated a single node system running in pseudo d= istributed mode and it went without any issues. But this system did not hav= e that much data as the production system.=C2=A0

S= ince this is a production system, I'm reluctant to do this update. I wo= uld like to see what other people have done in these cases and their experi= ences... Here are few questions I have..
  • When we upgrade,= does it change the namenode data structures and data nodes? I assume it on= ly changes the name node...
  • What are the risks with this upgrade ?= =C2=A0
  • Is there a place where I can review the changes made to file= system from 2.5.1 to 2.7.2?
I would really appreciate if you= can share your experiences.

Thanks in advan= ce,
Chathuri




--001a1143e542e2f1a6052ecb1241--