Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 39248 invoked from network); 22 Aug 2007 01:01:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Aug 2007 01:01:58 -0000 Received: (qmail 37129 invoked by uid 500); 22 Aug 2007 01:01:55 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 37009 invoked by uid 500); 22 Aug 2007 01:01:55 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 37000 invoked by uid 99); 22 Aug 2007 01:01:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2007 18:01:54 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Aug 2007 01:01:57 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 15A2F59A07 for ; Wed, 22 Aug 2007 01:01:37 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Wed, 22 Aug 2007 01:01:36 -0000 Message-ID: <20070822010136.25878.38365@eos.apache.org> Subject: [Lucene-hadoop Wiki] Trivial Update of "Hadoop 0.14 Upgrade" by RaghuAngadi X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by RaghuAngadi: http://wiki.apache.org/lucene-hadoop/Hadoop_0%2e14_Upgrade ------------------------------------------------------------------------------ = Upgrade Guide for Hadoop-0.14 = - This page describes upgrade information that is specific to Hadoop-0.14. The usual upgrade described in [:Hadoop_Upgrade: Hadoop Upgrade page] still applies for Hadoop-0.14. + This page describes upgrade information that is specific to Hadoop-0.14. The normal upgrade described in [:Hadoop_Upgrade: Hadoop Upgrade page] still applies for Hadoop-0.14. == Upgrade Path == @@ -36, +36 @@ == Monitoring the Upgrade == - The cluster stays in ''safeMode'' until the upgrade is complete. HDFS webui is a good place to check if safeMode is on or off. As always log files from ''namenode'' and ''datanode'' are useful when nothing else helps. + The cluster stays in ''safeMode'' until the upgrade is complete. HDFS webui is a good place to check if safeMode is on or off. As always, log files from ''namenode'' and ''datanode'' are useful when nothing else helps. Once the cluster is started with {{{-upgrade}}} option, the simplest way to monitor the upgrade is with '{{{dfsadmin -upgradeProgress status}}}' command. @@ -70, +70 @@ }}} * {{{Status = 78%}}} : This is a rough approximation of how much of upgrade is completed. - * {{{Block Level Stats}}} : Once the upgrade is started, Namenode iterates through all the block to check how many of the blocks are upgrade. This information is useful on large clusters where some datanodes may never complete upgrade of their blocks (discussed in later sections). + * {{{Block Level Stats}}} : Once the upgrade starts, Namenode iterates through all the block to check how many of the blocks are upgraded. This information is useful on large clusters where some datanodes may never complete upgrade of their blocks (discussed in later sections). - * {{{Fully Upgraded}}} : Percentage of blocks, where the expected number of replicas are upgraded. E.g. if a block has replication of 3, it is considered ''fully upgraded'' if at least three datanodes that contain this blocks have completed their updating checksums. + * {{{Fully Upgraded}}} : Percentage of blocks, where the expected number of replicas are upgraded. E.g. if a block has replication of 3, it is considered ''fully upgraded'' if at least three datanodes that contain this blocks have finished upgrade of their blocks. * {{{Minimally Upgraded}}} : Similar to above, number of upgraded replicas is at least {{{dfs.min.replication}}} (default 1) and is less than expected number of replicas. * {{{Under Upgraded}}} : number of upgraded replicas is less than {{{dfs.min.replication}}}. * {{{Un-upgraded}}} : blocks with zero upgraded replicas. * {{{Brief Datanode Status}}} : Each datanode reports its progress to the namenode during the upgrade. This shows average of percent completion on all the datanodes. This also shows how many datanodes have completed their upgrade. For the upgrade to proceed to next stage, all the datanodes should report completion of their local upgrade. - Note that in some cases, a few blocks might be ''over-replicated'' in such cases, upgrade might proceed to next stage even if some of the datanodes do not complete their upgrade. If {{{Fully Upgraded}}} is calculated to be 100%, namenode will proceed to next stage. + Note that in some cases, a few blocks might be ''over-replicated''. In such a case upgrade might proceed to next stage even if some of the datanodes do not complete their upgrade. If {{{Fully Upgraded}}} is calculated to be 100%, namenode will proceed to next stage even if not all the datanodes have completed their upgrade. ==== Potential Problems during Second Stage ==== - * ''The upgrade might seem to be stuck'' : Each datanode reports its progress once every minute. If the percent completion does not change change even afeter a few minutes, some datanodes might have some unexpected problems. Use {{{details}}} option with {{{-upgradeProgress}}} command to check which datanodes seem stagnant. {{{ + * ''The upgrade might seem to be stuck'' : Each datanode reports its progress once every minute. If the percent completion does not change even afeter a few minutes, some datanodes might have some unexpected problems. Use {{{details}}} option with {{{-upgradeProgress}}} command to check which datanodes seem stagnant. {{{ $ bin/hadoop dfsadmin -upgradeProgress details Distributed upgrade for version -6 is in progress. Status = 72% @@ -101, +101 @@ 192.168.0.24:50010 : 50 % 2044 u 1999 r 0 e 192.168.0.214:50010 : 100 % 4678 u 0 r 0 e ... - }}} You can run this command through '{{{grep -v "100 %"}}}' to find the nodes that have not completed their upgrade. If the problem nodes can not be corrected, as a last resort you can check ''Block Level Stats'' to see if the upgrade can be ''forced'' to next stage. E.g. if 98% are fully-upgraded and 2% minimally-upgraded, then you can reasonably sure that at least one copy of a block is upgraded. You can force next stage with {{{force}}} option : {{{ + }}} You can run this command through '{{{grep -v "100 %"}}}' to find the nodes that have not completed their upgrade. If the problem nodes can not be corrected, as a last resort you can check ''Block Level Stats'' to see if the upgrade can be ''forced'' to next stage. E.g. if 98% are fully-upgraded and 2% are minimally-upgraded, then you can reasonably be sure that at least one copy of a block is upgraded. You can force next stage with {{{force}}} option : {{{ $ bin/hadoop dfsadmin -upgradeProgress force Distributed upgrade for version -6 is in progress. Status = 90% @@ -119, +119 @@ can take longer than status implies. }}} Note {{{Force Proceed is ON}}} in the status message. - === Third Stage : Deleting {{{.crc}}} files === + === Third Stage : Deleting .crc files === - Once the second stage is complete, Namenode reports 90% completiong. It does not have a very good way of estimating time required for deleting the files. The ''status'' reports 90% completion all through this stage. Later tests with larger number of files indicates that it takes one hour to delete 2 million files on a rack server. The upgrade status report looks like the following. {{{ + Once the second stage is complete, Namenode reports 90% completion. It does not have a very good way of estimating time required for deleting the files. The ''status'' reports 90% completion all through this stage. Later tests with larger number of files indicates that it takes one hour to delete 2 million files on a rack server. The upgrade status report looks like the following. {{{ $ bin/hadoop dfsadmin -upgradeProgress status Distributed upgrade for version -6 is in progress. Status = 90% @@ -144, +144 @@ === Memory requirements === - HDFS nodes do not require more memory during the upgrade than for normal operation before the upgrade. We observed that Namenode might use 5-10% more memory (or more GC in JVM) during the upgrade. If the namenode was operating at the edge of its memory limits during the upgrade, it could potentially have some problems. At any time, cluster can be restarted and the HDFS resumes the upgrade. + HDFS nodes do not require more memory during the upgrade than for normal operation before the upgrade. We observed that Namenode might use 5-10% more memory (or more GC in JVM) during the upgrade. If the namenode was operating at the edge of its memory limits before the upgrade, it could potentially have some problems. At any time, cluster can be restarted and the HDFS resumes the upgrade. === Restarting a cluster ===