Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 553 invoked from network); 25 Feb 2011 19:19:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Feb 2011 19:19:52 -0000 Received: (qmail 43639 invoked by uid 500); 25 Feb 2011 19:19:52 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 43498 invoked by uid 500); 25 Feb 2011 19:19:51 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 43490 invoked by uid 99); 25 Feb 2011 19:19:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Feb 2011 19:19:51 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.147.107.20] (HELO mrout1-b.corp.re1.yahoo.com) (69.147.107.20) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Feb 2011 19:19:45 +0000 Received: from SP2-EX07CAS02.ds.corp.yahoo.com (sp2-ex07cas02.corp.sp2.yahoo.com [98.137.59.38]) by mrout1-b.corp.re1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p1PJI5Pl073154 for ; Fri, 25 Feb 2011 11:18:16 -0800 (PST) Received: from SP2-EX07VS03.ds.corp.yahoo.com ([98.137.59.32]) by SP2-EX07CAS02.ds.corp.yahoo.com ([98.137.59.38]) with mapi; Fri, 25 Feb 2011 11:18:08 -0800 From: Tanping Wang To: "hdfs-user@hadoop.apache.org" Date: Fri, 25 Feb 2011 11:18:02 -0800 Subject: RE: datanode down alert Thread-Topic: datanode down alert Thread-Index: AcvUK06ukCmwfeyuT4qn7ZKt1qIkkwA8f3NQ Message-ID: <330F741E7A20F843B6A800B81E37CE4102A3EB7212@SP2-EX07VS03.ds.corp.yahoo.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/mixed; boundary="_004_330F741E7A20F843B6A800B81E37CE4102A3EB7212SP2EX07VS03ds_" MIME-Version: 1.0 --_004_330F741E7A20F843B6A800B81E37CE4102A3EB7212SP2EX07VS03ds_ Content-Type: multipart/alternative; boundary="_000_330F741E7A20F843B6A800B81E37CE4102A3EB7212SP2EX07VS03ds_" --_000_330F741E7A20F843B6A800B81E37CE4102A3EB7212SP2EX07VS03ds_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Maybe grep for 2011-02-25 18:47:05,564 INFO org.apache.hadoop.hdfs.server.namenode.FSNames= ystem: Decommission complete for node 102.1.1.1:50010 In the namenode log to see if decommission is completed? I remember a similar problem was reported just a few days ago ( in attachme= nt) by James Litton. According to James, no block was missing after the = node was removed, however, it was unclear when/if the decommission process = was finished. From: Rita [mailto:rmorgan466@gmail.com] Sent: Thursday, February 24, 2011 5:59 AM To: hdfs-user@hadoop.apache.org Cc: Harsh J Subject: Re: datanode down alert Thanks for the response. I am asking because of the following issue, https://issues.apache.org/jira/= browse/HDFS-694 When I decommission a data node it shows up in the "Dead list" on the webgu= i coincidentally it also shows up in the "Live" nodes. I want to make sure this node is fully decommissioned before I remove it fr= om the cluster. On Tue, Feb 15, 2011 at 9:13 AM, Harsh J > wrote: I know of a way but I do not know for sure if that is what you're looking f= or: DFSClient.datanodeReport(DataNodeReportType.DEAD) should give you a list of all DEAD data nodes as per the NameNode. Although I believe the reports cost a lot, so do not do it often (rpcs the = NN). On Tue, Feb 15, 2011 at 6:51 PM, Rita > wrote: > Is there a programmatic way to determine if a datanode is down? > > > > -- > --- Get your facts first, then you can distort them as you please.-- > -- Harsh J www.harshj.com -- --- Get your facts first, then you can distort them as you please.-- --_000_330F741E7A20F843B6A800B81E37CE4102A3EB7212SP2EX07VS03ds_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Maybe grep for

 

2011-02-25 18:47:05,564 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Decommission complete = for node 102.1.1.1:50010

 

In the namenode log to see if decommission is completed?

 

I remember a similar problem was reported just a few days ag= o ( in attachment) by James Litton.    According to James, no bl= ock was missing after the node was removed, however, it was unclear when/if the decommission process was finished.

From: Rita [mailto:rmorgan466@gmail.com]
Sent: Thursday, February 24, 2011 5:59 AM
To: hdfs-user@hadoop.apache.org
Cc: Harsh J
Subject: Re: datanode down alert

 

Thanks for the response= .

I am asking because of the following issue, https://issues.apac= he.org/jira/browse/HDFS-694

When I decommission a data node it shows up in the "Dead list" on= the webgui coincidentally it also shows up in the "Live" nodes.


I want to make sure this node is fully decommissioned before I remove it fr= om the cluster.


On Tue, Feb 15, 2011 at 9:13 AM, Harsh J <qwertymaniac@gmail.com> wrote= :

I know of a way but I do not know for sure if that is = what you're looking for:

DFSClient.datanodeReport(DataNodeReportType.DEAD) should give you a
list of all DEAD data nodes as per the NameNode.

Although I believe the reports cost a lot, so do not do it often (rpcs the = NN).


On Tue, Feb 15, 2011 at 6:51 PM, Rita <rmorgan466@gmail.com> wrote:
> Is there a programmatic way to determine if a datanode is down?
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.-- >


--
Harsh J
www.harshj.com




--
--- Get your facts first, then you can distort them as you please.--

--_000_330F741E7A20F843B6A800B81E37CE4102A3EB7212SP2EX07VS03ds_-- --_004_330F741E7A20F843B6A800B81E37CE4102A3EB7212SP2EX07VS03ds_ Content-Type: message/rfc822 Received: from mrout3.yahoo.com (216.145.54.173) by sp2-ex07cas01.ds.corp.yahoo.com (98.137.59.37) with Microsoft SMTP Server id 8.3.137.0; Thu, 17 Feb 2011 10:45:11 -0800 Received: from mrin3-b.corp.re1.yahoo.com (mrin3-b.corp.re1.yahoo.com [69.147.105.209]) by mrout3.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id p1HIiQGR034222 for ; Thu, 17 Feb 2011 10:44:27 -0800 (PST) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mrin3-b.corp.re1.yahoo.com (8.14.4/8.14.4/y.in) with SMTP id p1HIiIcX078338 for ; Thu, 17 Feb 2011 10:44:18 -0800 (PST) Received: (qmail 36247 invoked by uid 500); 17 Feb 2011 18:44:13 -0000 Received: (qmail 36239 invoked by uid 99); 17 Feb 2011 18:44:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Feb 2011 18:44:13 +0000 Received: from [12.40.156.201] (HELO mercury.elysianfields.scottajones.com) (12.40.156.201) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Feb 2011 18:44:07 +0000 Received: from ares.elysianfields.scottajones.com ([172.20.1.5]) by mercury.elysianfields.scottajones.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 17 Feb 2011 13:43:45 -0500 Received: from 216.239.242.7 ([216.239.242.7]) by ares.elysianfields.scottajones.com ([172.20.1.5]) via Exchange Front-End Server mercury.scottajones.com ([172.20.1.16]) with Microsoft Exchange Server HTTP-DAV ; Thu, 17 Feb 2011 18:42:23 +0000 From: James Litton To: "hdfs-user@hadoop.apache.org" Date: Thu, 17 Feb 2011 10:41:57 -0800 Subject: Re: Decommissioning Nodes Thread-Topic: Decommissioning Nodes Thread-Index: AcvKMA8NsaC7m97hAU+PjybJ4JBu5AEmwVrQAAHRjmc= Message-ID: List-Help: List-Unsubscribe: In-Reply-To: <330F741E7A20F843B6A800B81E37CE410292CDDFB0@SP2-EX07VS03.ds.corp.yahoo.com> Reply-To: "hdfs-user@hadoop.apache.org" X-MS-Exchange-Organization-AuthAs: Anonymous X-MS-Exchange-Organization-AuthSource: SP2-EX07CAS01.ds.corp.yahoo.com X-MS-Has-Attach: X-Auto-Response-Suppress: All X-MS-TNEF-Correlator: x-originalarrivaltime: 17 Feb 2011 18:43:45.0765 (UTC) FILETIME=[9B824150:01CBCED2] user-agent: Microsoft-Entourage/12.28.0.101117 authentication-results: mrin3-b.corp.re1.yahoo.com; dkim=none (no signature); dkim-adsp=none x-spam-track: [cat=GD; info=rule:GD;ip:GD;ipsh:UK;ipst:GD;url2db:NN] list-id: delivered-to: mailing list hdfs-user@hadoop.apache.org received-spf: pass (athena.apache.org: domain of james.litton@chacha.com designates 12.40.156.201 as permitted sender) list-post: mailing-list: contact hdfs-user-help@hadoop.apache.org; run by ezmlm x-spam-check-by: apache.org x-asf-spam-status: No, hits=3.1 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_NUMERIC_HELO,SPF_PASS Content-Type: multipart/alternative; boundary="_000_C982D62545CAjameslittonchachacom_" MIME-Version: 1.0 --_000_C982D62545CAjameslittonchachacom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Tanping, Thank you for the reply. The nodes were marked as =93decommissioning in pr= ogress.=94 My concern was that they never reached a decommissioned state. I= have since begun taking the nodes down and have not have any data blocks m= issing, so I suspect the process worked. It was just unclear when the proce= ss was complete. James On 2/17/11 12:59 PM, "Tanping Wang" wrote: James, After issuing a command to decommission a node, you should at least be able= to see the following log messages in the namenode logs Setting the excludes files to some_file_contains_decommissioing_hostname Refreshing hosts (include/exclude) list If you do not see these log messages, maybe you want to check 1) Weather you have set dfs.hosts.exclude some_file_contains_decommissioing_hostname In hdfs-site.xml 2) If you have your this decommissioning hostname file in place. Regards, Tanping From: James Litton [mailto:james.litton@chacha.com] Sent: Friday, February 11, 2011 1:10 PM To: hdfs-user@hadoop.apache.org Subject: Decommissioning Nodes While decommissioning nodes I am seeing the following in my namenode logs: 2011-02-11 21:05:16,290 WARN org.apache.hadoop.hdfs.server.namenode.FSNames= ystem: Not able to place enough replicas, still in need of 5 I haven=92t seen any progress of decommissioning nodes in several days. I h= ave 12 total nodes with 6 being decommissioned and a replication factor of = 3. How long should I expect this to take? Is there a way to force this to m= ove forward? Thank you. --_000_C982D62545CAjameslittonchachacom_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Re: Decommissioning Nodes Tanping,

 Thank you for the reply. The nodes were marked as =93decommissioning = in progress.=94 My concern was that they never reached a decommissioned sta= te. I have since begun taking the nodes down and have not have any data blo= cks missing, so I suspect the process worked. It was just unclear when the process was complete.

James


On 2/17/11 12:59 PM, "Tanping Wang" <tanping@yahoo-inc.com> wrote:

James,
After issuing a command to decommission a node, you should at least be able= to see the following log messages in the namenode logs
 
Setting the excludes files to some_file_contains_decommissioing_hostname Refreshing hosts (include/exclude) list
 
If you do not see these log messages, maybe you want to check
1)     Weather you have set

<property>

 <name>dfs.hosts.exclude</name>
 <value>some_file_contains_decommissioin= g_hostname</value>

</property>

In hdfs-site.xml
2)     If you have your this decommissioning hostname f= ile in place.

Regards,
Tanping

From: James Litton [mailto:james.litton@chacha.com]
Sent: Friday, February 11, 2011 1:10 PM
To: hdfs-user@hadoop.apache.= org
Subject: Decommissioning Nodes

While decommissioning nodes I am seeing the following i= n my namenode logs:
2011-02-11 21:05:16,290 WARN org.apache.hadoop.hdfs.server.namenode.FSNames= ystem: Not able to place enough replicas, still in need of 5

I haven=92t seen any progress of decommissioning nodes in several days. I h= ave 12 total nodes with 6 being decommissioned and a replication factor of = 3. How long should I expect this to take? Is there a way to force this to m= ove forward?

Thank you.


--_000_C982D62545CAjameslittonchachacom_-- --_004_330F741E7A20F843B6A800B81E37CE4102A3EB7212SP2EX07VS03ds_--