Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B477EA36 for ; Wed, 30 Jan 2013 16:40:20 +0000 (UTC) Received: (qmail 4426 invoked by uid 500); 30 Jan 2013 16:40:15 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 4145 invoked by uid 500); 30 Jan 2013 16:40:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 4136 invoked by uid 99); 30 Jan 2013 16:40:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2013 16:40:15 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nitinpawar432@gmail.com designates 209.85.128.172 as permitted sender) Received: from [209.85.128.172] (HELO mail-ve0-f172.google.com) (209.85.128.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2013 16:40:10 +0000 Received: by mail-ve0-f172.google.com with SMTP id 15so1235480vea.17 for ; Wed, 30 Jan 2013 08:39:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=lrGP9w0KkymG9WwCuln4Y86hv7yXeFhKMe44xY22HYo=; b=KNjpcfr3IdAXhiOJMAyE9E4HBB0KJ1E83aLaLx6QcLzaUeymGd7/rZZ0lg7JxZTiIn uePkkPfulimMZ0aGlqTrozPH7YT5WQUgbE+48YE6NpJC4+qwCLTNcPiCscA3UBrnE7Xh 96WOLOtKWZd74FI+rqhg+1XDoDd71aEZYUL5wYFp113cx9eMrkHvN+AGUzNQVRAUX+09 c2t+tj1D8q9zF+RhtyO5GfSYBnDZU2c4eEN8wC6dQj5lBkF+6cgj4H/Thq2bFZKjXfeY fLFRaoJdYJ+WQNR8giUiT5HCfhs1VF1ZGjJ/4PamgL43DoS6sx5JHgNPIFLxZpqHfWIT EkXg== MIME-Version: 1.0 X-Received: by 10.220.119.200 with SMTP id a8mr5333317vcr.38.1359563989216; Wed, 30 Jan 2013 08:39:49 -0800 (PST) Received: by 10.59.9.67 with HTTP; Wed, 30 Jan 2013 08:39:49 -0800 (PST) In-Reply-To: References: <5D3AA85E9BCE45D0B204586E397BAE6E@gmail.com> Date: Wed, 30 Jan 2013 22:09:49 +0530 Message-ID: Subject: Re: what will happen when HDFS restarts but with some dead nodes From: Nitin Pawar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec54eed56a06c8d04d484281a X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54eed56a06c8d04d484281a Content-Type: text/plain; charset=ISO-8859-1 following are the configs it looks for . Unless Admin forces it to come out of safenode, it respects below values dfs.namenode.safemode.threshold-pct0.999fSpecifies the percentage of blocks that should satisfy the minimal replication requirement defined by dfs.namenode.replication.min. Values less than or equal to 0 mean not to wait for any particular percentage of blocks before exiting safemode. Values greater than 1 will make safe mode permanent. dfs.namenode.safemode.min.datanodes0Specifies the number of datanodes that must be considered alive before the name node exits safemode. Values less than or equal to 0 mean not to take the number of live datanodes into account when deciding whether to remain in safe mode during startup. Values greater than the number of datanodes in the cluster will make safe mode permanent.dfs.namenode.safemode.extension30000Determines extension of safe mode in milliseconds after the threshold level is reached. On Wed, Jan 30, 2013 at 10:06 PM, Chen He wrote: > Hi Harsh > > I have a question. How namenode gets out of safemode in condition of data > blocks lost, only administrator? Accordin to my experiences, the NN (0.21) > stayed in safemode about several days before I manually turn safemode off. > There were 2 blocks lost. > > Chen > > > On Wed, Jan 30, 2013 at 10:27 AM, Harsh J wrote: > >> NN does recalculate new replication work to do due to unavailable >> replicas ("under-replication") when it starts and receives all block >> reports, but executes this only after out of safemode. When in >> safemode, across the HDFS services, no mutations are allowed. >> >> On Wed, Jan 30, 2013 at 8:34 AM, Nan Zhu wrote: >> > Hi, all >> > >> > I'm wondering if HDFS is stopped, and some of the machines of the >> cluster >> > are moved, some of the block replication are definitely lost for moving >> > machines >> > >> > when I restart the system, will the namenode recalculate the data >> > distribution? >> > >> > Best, >> > >> > -- >> > Nan Zhu >> > School of Computer Science, >> > McGill University >> > >> > >> >> >> >> -- >> Harsh J >> > > -- Nitin Pawar --bcaec54eed56a06c8d04d484281a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
following are the configs it looks for . Unless Admin forc= es it to come out of safenode, it respects below values=A0



O= n Wed, Jan 30, 2013 at 10:06 PM, Chen He <airbots@gmail.com>= wrote:
Hi Harsh

I have a question. How namen= ode gets out of safemode in condition of data blocks lost, only administrat= or? Accordin to my experiences, the NN (0.21) stayed in safemode about seve= ral days before I manually turn safemode off. There were 2 blocks lost.

Chen


On Wed, Jan 30, 2013 at 10:27 AM, Harsh J <harsh@= cloudera.com> wrote:
NN does recalculate new replication work to do due to unavailable
replicas ("under-replication") when it starts and receives all bl= ock
reports, but executes this only after out of safemode. When in
safemode, across the HDFS services, no mutations are allowed.

On Wed, Jan 30, 2013 at 8:34 AM, Nan Zhu <zhunansjtu@gmail.com> wrote:
> Hi, all
>
> I'm wondering if HDFS is stopped, and some of the machines of the = cluster
> are moved, =A0some of the block replication are definitely lost for mo= ving
> machines
>
> when I restart the system, will the namenode recalculate the data
> distribution?
>
> Best,
>
> --
> Nan Zhu
> School of Computer Science,
> McGill University
>
>



--
Harsh J




--
= Nitin Pawar
--bcaec54eed56a06c8d04d484281a--
dfs.namenode.safemode.t= hreshold-pct0.999fSpecifies the percentage of blocks = that should satisfy the minimal replication requirement defined by dfs.name= node.replication.min. Values less than or equal to 0 mean not to wait for a= ny particular percentage of blocks before exiting safemode. Values greater = than 1 will make safe mode permanent.
dfs.namenode.s= afemode.min.datanodes0Specifies the number of datanod= es that must be considered alive before the name node exits safemode. Value= s less than or equal to 0 mean not to take the number of live datanodes int= o account when deciding whether to remain in safe mode during startup. Valu= es greater than the number of datanodes in the cluster will make safe mode = permanent.
dfs.namenode.safem= ode.extension30000Determines extension of safe mode i= n milliseconds after the threshold level is reached.