Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20C4A10405 for ; Fri, 21 Feb 2014 09:48:14 +0000 (UTC) Received: (qmail 44626 invoked by uid 500); 21 Feb 2014 09:48:05 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 43898 invoked by uid 500); 21 Feb 2014 09:48:04 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 43884 invoked by uid 99); 21 Feb 2014 09:48:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Feb 2014 09:48:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.215.50] (HELO mail-la0-f50.google.com) (209.85.215.50) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Feb 2014 09:47:53 +0000 Received: by mail-la0-f50.google.com with SMTP id ec20so2136306lab.37 for ; Fri, 21 Feb 2014 01:47:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=xNa0tzFKFdS1DiShBrMWTqI6hXwmrCCM9GLbksdQpx0=; b=K7SXf0RqyS4pt9ZaCTNC86yBoQmS1wZMOcSGGtbb8wNyV1ffrfAFWrYEX6lDEPvGDp mQNu2aP1k8smU8hImmhDm2fndxCKQfZgCdLzIHw16sFmRBcMFAM3lVG3vkuxk3jrydAJ 989L6RfCO7qePXl9hIWRGoi99bnetAvtB/HB8Sll9/xUBA8vl2jSY/FS8vT/xXPu54YB wO6fiTwnHevHKS73kiYdiR/DZb2RtWyeC8jgJfiZnmPzwo0dP/Zet4P7z0ceYv+Ly6Zh sTkohqks7O1wrfYm0PQiglMYtttkY6/tKKpv6YvHBSvCo1wIQnT+ICeCYII8KUpSDKiH MkLw== X-Gm-Message-State: ALoCoQllKAh2FQoR9da9boJCm/4irBTkwhxRTwby7QNRzf0a38R7Tr8yTvW+1eIuu1IjpIryZSz1 MIME-Version: 1.0 X-Received: by 10.152.234.36 with SMTP id ub4mr3836765lac.13.1392976052689; Fri, 21 Feb 2014 01:47:32 -0800 (PST) Received: by 10.112.41.72 with HTTP; Fri, 21 Feb 2014 01:47:32 -0800 (PST) Date: Fri, 21 Feb 2014 15:17:32 +0530 Message-ID: Subject: Datanodes going out of reach in hadoop From: Yogini Gulkotwar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11344e5acd2d0804f2e78211 X-Virus-Checked: Checked by ClamAV on apache.org --001a11344e5acd2d0804f2e78211 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, I am working with a 5 node hadoop cluster. The hdfs is on a shared NFS directory of 98TB. So when we view the namenode UI, the following is displayed: *Node* *Last Contact* *Admin State* *ConfiguredCapacity (TB)* *Used(TB)* *Non DFS Used (TB)* *Remaining(TB)* *Used(%)* *Remaining(%)* *Blocks* *Block PoolUsed (TB)* *Block PoolUsed (%)> Blocks* *Failed Volumes* Datanode1 0 In Service 97.39 1.83 38.04 57.52 1.88 59.06 80653 1.83 1.88 0 Datanode2 1 In Service 97.39 1.18 38.69 57.52 1.21 59.06 54536 1.18 1.21 0 Datanode3 0 In Service 97.39 1.61 38.26 57.52 1.65 59.06 66902 1.61 1.65 0 Datanode4 2 In Service 97.39 0.65 39.22 57.52 0.67 59.06 32821 0.65 0.67 0 Datanode5 2 In Service 97.39 0.58 39.29 57.52 0.6 59.06 29278 0.58 0.6 0 As can be seen, the each datanode thinks that it has the entire 98TB to itself. And three of the datanodes (1,2,3) have comparatively more data. The balancing command doesn't help due to this situation. And in the recent times, I have come across a strange issue. The three datanodes with more data go out of reach from the namenode (at different instances). That is, the services on the datanode is running but the "LAST CONTACT" column in the above table reports a high value and after a while NAMENODE reports the node as DEAD. Within 10 minutes or so, the datanode goes LIVE again. I tried going through the logs, but couldn't find any error. I tried increasing the ulimit on these datanodes, but in vain. Is there something that needs to done to overcome this issue? Any configuration changes? Any help would be appreciated. Thanks & Regards, Yogini Gulkotwar=E2=94=82Data Scientist *Flutura Business Solutions Private Limited* *=E2=80=8B=E2=80=8B* =E2=80=8B*BANGALORE* =E2=80=8B --001a11344e5acd2d0804f2e78211 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
= Hello,
I am working with a 5 node hadoop cluster. The hdfs is on a sh= ared NFS directory of 98TB.
So whe= n we view the namenode UI, the following is displayed:

Node Last=C2=A0
Contact
Admin State Configured
Capacity (TB)
Used
(TB)
Non DFS=C2=A0
Used (TB)
Remaining
(TB)
Used
(%)
Remaining
(%)
Blocks Block Pool
Used (TB)
Block Pool
Used (%)> Blocks
<= /td>
Failed Volumes
Datanode1 0 In Service 97.39 1.83 38.04 57.52 1.88 59.06 80653 1.83 1.88 0
Datanode2 1 In Service 97.39 1.18 38.69 57.52 1.21 59.06 54536 1.18 1.21 0
Datanode3 0 In Service 97.39 1.61 38.26 57.52 1.65 59.06 66902 1.61 1.65 0
Datanode4 2 In Service 97.39 0.65 39.22 57.52 0.67 59.06 32821 0.65 0.67 0
Datanode5 2 In Service 97.39 0.58 39.29 57.52 0.6 59.06 29278 0.58 0.6 0

As can be seen, the each datanode thinks that it= has the entire 98TB to itself. And three of the datanodes (1,2,3) have com= paratively more data.
The ba= lancing command doesn't help due to this situation.

And in= the recent times, I have come across a strange issue. The three datanodes = with more data go out of reach from the namenode (at different instances).<= /font>
That i= s, the services on the datanode is running but the "LAST CONTACT"= column in the above table reports a high value and after a while NAMENODE = reports the node as DEAD.
Within= 10 minutes or so, the datanode goes LIVE again.
I tried going throug= h the logs, but couldn't find any error.
I trie= d increasing the ulimit on these datanodes, but in vain.

<= /div>
Is t= here something that needs to done to overcome this issue?
Any configuratio= n changes? Any help would be appreciated.

<= font face=3D"trebuchet ms, sans-serif" style=3D"background-color:rgb(255,25= 5,255)" color=3D"#000000">Thanks & Regards,

Yogini Gulkotwar=E2=94=82Data Scientist

Flut= ura Business Solutions Private Limited

=E2=80=8B=E2=80=8B

<= /p>

=E2=80=8BBANGALORE

=E2=80=8B

--001a11344e5acd2d0804f2e78211--