From user-return-13579-apmail-hadoop-common-user-archive=hadoop.apache.org@hadoop.apache.org Mon Feb 3 06:28:33 2014 Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5476010A96 for ; Mon, 3 Feb 2014 06:28:33 +0000 (UTC) Received: (qmail 52114 invoked by uid 500); 3 Feb 2014 06:28:24 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 51367 invoked by uid 500); 3 Feb 2014 06:28:19 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 51350 invoked by uid 99); 3 Feb 2014 06:28:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Feb 2014 06:28:17 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of brahmareddy.battula@huawei.com designates 119.145.14.64 as permitted sender) Received: from [119.145.14.64] (HELO szxga01-in.huawei.com) (119.145.14.64) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Feb 2014 06:28:12 +0000 Received: from 172.24.2.119 (EHLO szxeml214-edg.china.huawei.com) ([172.24.2.119]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id BQW28892; Mon, 03 Feb 2014 14:27:50 +0800 (CST) Received: from SZXEML415-HUB.china.huawei.com (10.82.67.154) by szxeml214-edg.china.huawei.com (172.24.2.29) with Microsoft SMTP Server (TLS) id 14.3.158.1; Mon, 3 Feb 2014 14:27:50 +0800 Received: from SZXEML510-MBX.china.huawei.com ([169.254.3.198]) by szxeml415-hub.china.huawei.com ([10.82.67.154]) with mapi id 14.03.0158.001; Mon, 3 Feb 2014 14:27:47 +0800 From: Brahma Reddy Battula To: "user@hadoop.apache.org" Subject: RE: java.io.FileNotFoundException: http://HOSTNAME:50070/getimage?getimage=1 Thread-Topic: java.io.FileNotFoundException: http://HOSTNAME:50070/getimage?getimage=1 Thread-Index: Ac8eg8Iux+OMPjmwTJi+9t5CoiSB5///gs8AgAACugCAAAUVgIAAA92AgAANBACABAssgIAAnAVt Date: Mon, 3 Feb 2014 06:27:47 +0000 Message-ID: <8AD4EE147886274A8B495D6AF407DF694B1D0488@szxeml510-mbx.china.huawei.com> References: <2270F3695BF2C04DB9FE975FB546FF3C21531E6D@NDA-HCLC-MBS03.hclc.corp.hcl.in> <2270F3695BF2C04DB9FE975FB546FF3C21531EEC@NDA-HCLC-MBS03.hclc.corp.hcl.in> <2270F3695BF2C04DB9FE975FB546FF3C21531F7D@NDA-HCLC-MBS03.hclc.corp.hcl.in> ,<2270F3695BF2C04DB9FE975FB546FF3C2153E889@NDA-HCLC-MBS03.hclc.corp.hcl.in> In-Reply-To: <2270F3695BF2C04DB9FE975FB546FF3C2153E889@NDA-HCLC-MBS03.hclc.corp.hcl.in> Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.18.144.148] Content-Type: multipart/alternative; boundary="_000_8AD4EE147886274A8B495D6AF407DF694B1D0488szxeml510mbxchi_" MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Virus-Checked: Checked by ClamAV on apache.org --_000_8AD4EE147886274A8B495D6AF407DF694B1D0488szxeml510mbxchi_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi >From your mail, you are facing following problems.(Mainly NameNode HTTP req= uests are not going) i) Checkpoint is not happening With 1.0.4, only one checkpoint process is executed at a time. When the nam= enode gets an overlapping checkpointing request, it checks edit.new in its = storage directories. If namenode have this file, namenode concludes the pre= vious checkpoint process is not done yet and prints the warning message you= 've seen. This is the case if you ensure edits.new file before the error operation re= sidual useless files can be deleted after the detection of whether there is= such a problem In your case While checkpoint was inprogress power failure might occured he= nce edits.new file was not renamed...Hope before power failure checkpoint w= as success.. ii) Not able browse the Namenode UI. I think,,Mostly hostmapping is missconfigured or hostname is changed(Can yo= u cross check hostname you configured in /etc/hosts and hostname of the mac= hine is correct or not). Why hostname is different from the /etc/hosts to Configuration,..? >From /etc/hosts/ IP1 Hostname1 # Namenode- vm01 - itself >From Configurations : dfs.http.address HOSTNAME:50070 In this case ./hadoop fsck / also should not work. can you please try and s= end us the result.. Thanks & Regards Brahma Reddy Battula ________________________________ From: Stuti Awasthi [stutiawasthi@hcl.com] Sent: Monday, February 03, 2014 10:11 AM To: user Subject: RE: java.io.FileNotFoundException: http://HOSTNAME:50070/getimage?= getimage=3D1 Hi All, Any other tips that can resolve this issue ?? Please suggest Thanks Stuti Awasthi From: Jitendra Yadav [mailto:jeetuyadav200890@gmail.com] Sent: Friday, January 31, 2014 8:26 PM To: user Subject: Re: java.io.FileNotFoundException: http://HOSTNAME:50070/getimage?= getimage=3D1 ohh I didn't realized that you are still using 1.0.4 release, yes the prope= rty was deprecated in new releases. Thanks Jitendra On Fri, Jan 31, 2014 at 7:39 PM, Stuti Awasthi > wrote: Hadoop version is 1.0.4 In hdfs-default.html for 1.0.4 version we have following property: dfs.http.address dfs.secondary.http.address dfs.namenode.http-address : I suppose this property is not valid for Hadoop= 1.x Please suggest Thanks From: Jitendra Yadav [mailto:jeetuyadav200890@gmail.com] Sent: Friday, January 31, 2014 7:26 PM To: user Subject: Re: java.io.FileNotFoundException: http://HOSTNAME:50070/getimage?= getimage=3D1 Can you please change below property and restart your cluster again? FROM: dfs.http.address TO: dfs.namenode.http-addres Thanks Jitendra On Fri, Jan 31, 2014 at 7:07 PM, Stuti Awasthi > wrote: Hi Jitendra, I realized that some days back ,my cluster was down due to power failure af= ter which nn/current directory has : edits, edits.new file and now SNN is n= ot rolling these edits due to HTTP error. Also currently my NN and SNN are operating on same machine DFSadmin report : Configured Capacity: 659494076416 (614.2 GB) Present Capacity: 535599210496 (498.82 GB) DFS Remaining: 497454006272 (463.29 GB) DFS Used: 38145204224 (35.53 GB) DFS Used%: 7.12% Under replicated blocks: 283 Blocks with corrupt replicas: 3 Missing blocks: 3 ------------------------------------------------- Datanodes available: 8 (8 total, 0 dead) Name: 10.139.9.238:50010 Decommission Status : Normal Configured Capacity: 82436759552 (76.78 GB) DFS Used: 4302274560 (4.01 GB) Non DFS Used: 8391843840 (7.82 GB) DFS Remaining: 69742641152(64.95 GB) DFS Used%: 5.22% DFS Remaining%: 84.6% Last contact: Fri Jan 31 18:55:18 IST 2014 Name: 10.139.9.233:50010 Decommission Status : Normal Configured Capacity: 82436759552 (76.78 GB) DFS Used: 5774745600 (5.38 GB) Non DFS Used: 13409488896 (12.49 GB) DFS Remaining: 63252525056(58.91 GB) DFS Used%: 7.01% DFS Remaining%: 76.73% Last contact: Fri Jan 31 18:55:19 IST 2014 Name: 10.139.9.232:50010 Decommission Status : Normal Configured Capacity: 82436759552 (76.78 GB) DFS Used: 8524451840 (7.94 GB) Non DFS Used: 24847884288 (23.14 GB) DFS Remaining: 49064423424(45.69 GB) DFS Used%: 10.34% DFS Remaining%: 59.52% Last contact: Fri Jan 31 18:55:21 IST 2014 Name: 10.139.9.236:50010 Decommission Status : Normal Configured Capacity: 82436759552 (76.78 GB) DFS Used: 4543819776 (4.23 GB) Non DFS Used: 8669548544 (8.07 GB) DFS Remaining: 69223391232(64.47 GB) DFS Used%: 5.51% DFS Remaining%: 83.97% Last contact: Fri Jan 31 18:55:19 IST 2014 Name: 10.139.9.235:50010 Decommission Status : Normal Configured Capacity: 82436759552 (76.78 GB) DFS Used: 5092986880 (4.74 GB) Non DFS Used: 8669454336 (8.07 GB) DFS Remaining: 68674318336(63.96 GB) DFS Used%: 6.18% DFS Remaining%: 83.31% Last contact: Fri Jan 31 18:55:19 IST 2014 Name: 10.139.9.237:50010 Decommission Status : Normal Configured Capacity: 82436759552 (76.78 GB) DFS Used: 4604301312 (4.29 GB) Non DFS Used: 11005788160 (10.25 GB) DFS Remaining: 66826670080(62.24 GB) DFS Used%: 5.59% DFS Remaining%: 81.06% Last contact: Fri Jan 31 18:55:18 IST 2014 Name: 10.139.9.234:50010 Decommission Status : Normal Configured Capacity: 82436759552 (76.78 GB) DFS Used: 4277760000 (3.98 GB) Non DFS Used: 12124221440 (11.29 GB) DFS Remaining: 66034778112(61.5 GB) DFS Used%: 5.19% DFS Remaining%: 80.1% Last contact: Fri Jan 31 18:55:18 IST 2014 Name: 10.139.9.231:50010 Decommission Status : Normal Configured Capacity: 82436759552 (76.78 GB) DFS Used: 1024864256 (977.39 MB) Non DFS Used: 36776636416 (34.25 GB) DFS Remaining: 44635258880(41.57 GB) DFS Used%: 1.24% DFS Remaining%: 54.14% Last contact: Fri Jan 31 18:55:20 IST 2014 From: Jitendra Yadav [mailto:jeetuyadav200890@gmail.com] Sent: Friday, January 31, 2014 6:58 PM To: user Subject: Re: java.io.FileNotFoundException: http://HOSTNAME:50070/getimage?= getimage=3D1 Hi, Please post the output of dfs report command, this could help us to underst= and cluster health. # hadoop dfsadmin -report Thanks Jitendra On Fri, Jan 31, 2014 at 6:44 PM, Stuti Awasthi > wrote: Hi All, I am suddenly started facing issue on Hadoop Cluster. Seems like HTTP reque= st at port 50070 on dfs is not working properly. I have an Hadoop cluster which is operating from several days. Recently we = are also not able to see dfshealth.jsp page from webconsole. Problems : 1. http://:50070/dfshealth.jsp shows following error HTTP ERROR: 404 Problem accessing /. Reason: NOT_FOUND 2. SNN is not able to roll edits : ERROR in SecondaryNameNode Log java.io.FileNotFoundException: http://HOSTNAME:50070/getimage?getimage=3D1 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpUR= LConnection.java:1401) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileCli= ent(TransferFsImage.java:160) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(Se= condaryNameNode.java:347) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(Se= condaryNameNode.java:336) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInf= ormation.java:1093) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.download= CheckpointFiles(SecondaryNameNode.java:336) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckp= oint(SecondaryNameNode.java:411) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(S= econdaryNameNode.java:312) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(Seco= ndaryNameNode.java:275) ERROR in Namenode Log 2014-01-31 18:15:12,046 INFO org.apache.hadoop.hdfs.server.namenode.FSNames= ystem: Roll Edit Log from 10.139.9.231 2014-01-31 18:15:12,046 WARN org.apache.hadoop.hdfs.server.namenode.FSNames= ystem: Cannot roll edit log, edits.new files already exists in all healthy = directories: /usr/lib/hadoop/storage/dfs/nn/current/edits.new Namenode logs which suggest that webserver is started on 50070 successfully= : 2014-01-31 14:42:35,208 INFO org.apache.hadoop.http.HttpServer: Port return= ed by webServer.getConnectors()[0].getLocalPort() before open() is -1. Open= ing the listener on 50070 2014-01-31 14:42:35,209 INFO org.apache.hadoop.http.HttpServer: listener.ge= tLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() ret= urned 50070 2014-01-31 14:42:35,209 INFO org.apache.hadoop.http.HttpServer: Jetty bound= to port 50070 2014-01-31 14:42:35,378 INFO org.apache.hadoop.hdfs.server.namenode.NameNod= e: Web-server up at: HOSTNAME:50070 Hdfs-site.xml dfs.replication 2 dfs.name.dir /usr/lib/hadoop/storage/dfs/nn dfs.data.dir /usr/lib/hadoop/storage/dfs/dn dfs.permissions false dfs.webhdfs.enabled true dfs.http.address HOSTNAME:50070 dfs.secondary.http.address HOSTNAME:50090 fs.checkpoint.dir /usr/lib/hadoop/storage/dfs/snn /etc/hosts (Note I have also tried by commenting 127.0.0.1 entry in host fi= le but the issue was not resolved) 127.0.0.1 localhost IP1 Hostname1 # Namenode- vm01 - itself IP2 Hostname2 # DataNode- vm02 =85=85.. # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters Note : All Hadoop daemons are executing fine and the jobs are running prope= rly. How to resolve this issue, I have tried many options provided on different = forums but still facing the same issue. I belive that this can cause a major problem later as my edits are not gett= ing rolled into fsimage.. This can cause me a data loss in case of failure. Please suggest Thanks Stuti ::DISCLAIMER:: ---------------------------------------------------------------------------= ------------------------------------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and inte= nded for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as informa= tion could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in trans= mission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability = on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the = author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, disse= mination, copying, disclosure, modification, distribution and / or publication of this message without the prior written= consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please= delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses = and other defects. ---------------------------------------------------------------------------= ------------------------------------------------------------------------- --_000_8AD4EE147886274A8B495D6AF407DF694B1D0488szxeml510mbxchi_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
Hi
 
From your mail, you are facing following problems.(M= ainly NameNode HTTP requests are not going)
 
= i) Checkpoint is not happening
 
With 1.0.4, only one checkpoint process is executed = at a time. When the namenode gets an overlapping checkpoin= ting request, it checks edit.new in its storage directories. If&nbs= p;namenode have this file, namenode concludes the previo= us checkpoint process is not done yet and prints the warning message you've= seen.
This is the case if you ensure edits.new file before the error = operation residual useless files can be deleted after the detection of whet= her there is such a problem
 
In your case While checkpoint was inprogress= power failure might occured hence edits.new file was not renamed...Hope before power failure checkpoint was success..<= /span>
 
 
ii) Not able browse the Namenode UI.
 
 
I think,,Mostly hostmapping = is missconfigured or hostname is changed(Can you cross check hostname you configured in /etc/hosts and hostname of the mach= ine is correct or not).
 
Why hostname is different from the /etc/host= s to Configuration,..?
 <= /div>
From /etc/hosts/
 
IP1 Hostname1 # Namenode- vm01 - itself

 

From Configurations :

 

<property>
<name>dfs.http.address</name>
<value>HOSTNAME:50070</value>
</property>

 

   

In this case ./hadoop fsck / = also should not work. can you please try and send us the result..

 

 

 

    Thanks & Regards

 Brahma Reddy Battula

 

 

From: Stuti Awasthi [stutiawasthi@hcl.com]=
Sent: Monday, February 03, 2014 10:11 AM
To: user
Subject: RE: java.io.FileNotFoundException: http://HOSTNAME:50070/ge= timage?getimage=3D1

Hi All,

 

Any other tips that can resolve this issue= ??

Please suggest

 

Thanks

Stuti Awasthi

 

From: Jitendra Yadav [mailto:jeetuyadav200890@gmail.= com]
Sent: Friday, January 31, 2014 8:26 PM
To: user
Subject: Re: java.io.FileNotFoundException: http://HOSTNAME:50070/ge= timage?getimage=3D1

 

ohh I didn't realized that you are still using 1.0.4= release, yes the property was deprecated in new releases. 

 

Thanks

Jitendra

 

On Fri, Jan 31, 2014 at 7:39 PM, Stuti Awasthi <<= a href=3D"mailto:stutiawasthi@hcl.com" target=3D"_blank">stutiawasthi@hcl.c= om> wrote:

Hadoop version is 1.0.4

In hdfs-default.html for 1.0.4 version we = have following property:

dfs.http.address

dfs.secondary.http.address

 

 

dfs.namenode.http-= address : I suppose this property is not valid for Hadoop 1.x   &= nbsp; 

 

Please suggest

 

Thanks

From: Jitendra Yadav [mailto:jeetuyadav200890@gmail.com]
Sent: Friday, January 31, 2014 7:26 PM


To: user
Subject: Re: java.io.FileNotFoundException: http://HOSTNAME:50070/getimage?getimage=3D1

 

Can you please change below property and restart you= r cluster again?

 

FROM:

  <name>dfs.http.address</name>

 

TO:

  <name>dfs.namenode.http-addres</name&g= t;

 

Thanks

Jitendra 

 

On Fri, Jan 31, 2014 at 7:07 PM, Stuti Awasthi <<= a href=3D"mailto:stutiawasthi@hcl.com" target=3D"_blank">stutiawasthi@hcl.c= om> wrote:

Hi Jitendra,

 

I realized that some days back ,my cluster= was down due to power failure after which nn/current directory has : edits= , edits.new file and now SNN is not rolling these edits due to HTTP error.

Also currently my NN and SNN are operating= on same machine

 

 

DFSadmin report :

 

Configured Capacity: 659494076416 (614.2 = GB)

Present Capacity: 535599210496 (498.82 GB= )

DFS Remaining: 497454006272 (463.29 GB)

DFS Used: 38145204224 (35.53 GB)

DFS Used%: 7.12%

Under replicated blocks: 283

Blocks with corrupt replicas: 3

Missing blocks: 3

 

-----------------------------------------= --------

Datanodes available: 8 (8 total, 0 dead)<= /span>

 

Name: 10.139.9.238:50010<= /a>

Decommission Status : Normal

Configured Capacity: 82436759552 (76.78 G= B)

DFS Used: 4302274560 (4.01 GB)

Non DFS Used: 8391843840 (7.82 GB)=

DFS Remaining: 69742641152(64.95 GB)

DFS Used%: 5.22%

DFS Remaining%: 84.6%

Last contact: Fri Jan 31 18:55:18 IST 201= 4

 

 

Name: 10.139.9.233:50010<= /a>

Decommission Status : Normal

Configured Capacity: 82436759552 (76.78 G= B)

DFS Used: 5774745600 (5.38 GB)

Non DFS Used: 13409488896 (12.49 GB)

DFS Remaining: 63252525056(58.91 GB)

DFS Used%: 7.01%

DFS Remaining%: 76.73%

Last contact: Fri Jan 31 18:55:19 IST 201= 4

 

 

Name: 10.139.9.232:50010<= /a>

Decommission Status : Normal

Configured Capacity: 82436759552 (76.78 G= B)

DFS Used: 8524451840 (7.94 GB)

Non DFS Used: 24847884288 (23.14 GB)

DFS Remaining: 49064423424(45.69 GB)

DFS Used%: 10.34%

DFS Remaining%: 59.52%

Last contact: Fri Jan 31 18:55:21 IST 201= 4

 

 

Name: 10.139.9.236:50010<= /a>

Decommission Status : Normal

Configured Capacity: 82436759552 (76.78 G= B)

DFS Used: 4543819776 (4.23 GB)

Non DFS Used: 8669548544 (8.07 GB)

DFS Remaining: 69223391232(64.47 GB)

DFS Used%: 5.51%

DFS Remaining%: 83.97%

Last contact: Fri Jan 31 18:55:19 IST 201= 4

 

 

Name: 10.139.9.235:50010<= /a>

Decommission Status : Normal

Configured Capacity: 82436759552 (76.78 G= B)

DFS Used: 5092986880 (4.74 GB)

Non DFS Used: 8669454336 (8.07 GB)

DFS Remaining: 68674318336(63.96 GB)

DFS Used%: 6.18%

DFS Remaining%: 83.31%

Last contact: Fri Jan 31 18:55:19 IST 201= 4

 

 

Name: 10.139.9.237:50010<= /a>

Decommission Status : Normal

Configured Capacity: 82436759552 (76.78 G= B)

DFS Used: 4604301312 (4.29 GB)

Non DFS Used: 11005788160 (10.25 GB)

DFS Remaining: 66826670080(62.24 GB)

DFS Used%: 5.59%

DFS Remaining%: 81.06%

Last contact: Fri Jan 31 18:55:18 IST 201= 4

 

 

Name: 10.139.9.234:50010<= /a>

Decommission Status : Normal

Configured Capacity: 82436759552 (76.78 G= B)

DFS Used: 4277760000 (3.98 GB)

Non DFS Used: 12124221440 (11.29 GB)

DFS Remaining: 66034778112(61.5 GB)

DFS Used%: 5.19%

DFS Remaining%: 80.1%

Last contact: Fri Jan 31 18:55:18 IST 201= 4

 

 

Name: 10.139.9.231:50010<= /a>

Decommission Status : Normal

Configured Capacity: 82436759552 (76.78 G= B)

DFS Used: 1024864256 (977.39 MB)

Non DFS Used: 36776636416 (34.25 GB)

DFS Remaining: 44635258880(41.57 GB)

DFS Used%: 1.24%

DFS Remaining%: 54.14%

Last contact: Fri Jan 31 18:55:20 IST 201= 4

 

 

 

From: Jitendra Yadav [mailto:jeetuyadav200890@gmail.com]
Sent: Friday, January 31, 2014 6:58 PM
To: user
Subject: Re: java.io.FileNotFoundException: http://HOSTNAME:50070/getimage?getimage=3D1

 

Hi,

 

Please post the output of dfs report command, this c= ould help us to understand cluster health.

 

# hadoop dfsadmin -report

 

Thanks

Jitendra

 

On Fri, Jan 31, 2014 at 6:44 PM, Stuti Awasthi <<= a href=3D"mailto:stutiawasthi@hcl.com" target=3D"_blank">stutiawasthi@hcl.c= om> wrote:

Hi All,

 

I am suddenly started facing issue on Hadoop Cluster= . Seems like HTTP request at port 50070 on dfs is not working properly.

I have an Hadoop cluster which is operating from sev= eral days. Recently we are also not able to see dfshealth.jsp page from web= console.

 

Problems :

1. http://<Hostname>:50070/dfshealth.jsp shows following error

 

HTTP ERROR: 404

Problem accessing /. Reason:

NOT_FOUND

 

2. SNN is not able to roll edits :   =       

ERROR in SecondaryNameNode Log

java.io.FileNotFoundException: h= ttp://HOSTNAME:50070/getimage?getimage=3D1

       at s= un.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection= .java:1401)

       at o= rg.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(Transfe= rFsImage.java:160)

       at o= rg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryName= Node.java:347)

       at o= rg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryName= Node.java:336)

       at j= ava.security.AccessController.doPrivileged(Native Method)

       at j= avax.security.auth.Subject.doAs(Subject.java:416)

       at o= rg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja= va:1093)

       at o= rg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointF= iles(SecondaryNameNode.java:336)

       at o= rg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(Second= aryNameNode.java:411)

       at o= rg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNam= eNode.java:312)

       at o= rg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNo= de.java:275)

 

ERROR in Namenode Log

2014-01-31 18:15:12,046 INFO org.apache.h= adoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.139.9.231

2014-01-31 18:15:12,046 WARN org.apache.h= adoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log, edits.new fi= les already exists in all healthy directories:

  /usr/lib/hadoop/storage/dfs/nn/cur= rent/edits.new

 

 

 

Namenode logs which suggest that webserver is starte= d on 50070 successfully:

2014-01-31 14:42:35,208 INFO org.apache.h= adoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLoc= alPort() before open() is -1. Opening the listener on 50070

2014-01-31 14:42:35,209 INFO org.apache.h= adoop.http.HttpServer: listener.getLocalPort() returned 50070 webServer.get= Connectors()[0].getLocalPort() returned 50070

2014-01-31 14:42:35,209 INFO org.apache.h= adoop.http.HttpServer: Jetty bound to port 50070

2014-01-31 14:42:35,378 INFO org.apache.h= adoop.hdfs.server.namenode.NameNode: Web-server up at: HOSTNAME:50070

 

 

Hdfs-site.xml

<configuration>

    <property>

       = ; <name>dfs.replication</name>

       = ; <value>2</value>

    </property>

 

    <property>

       = ; <name>dfs.name.dir</name>

       = ; <value>/usr/lib/hadoop/storage/dfs/nn</value>

    </property>

 

    <property>

       = ; <name>dfs.data.dir</name>

       = ; <value>/usr/lib/hadoop/storage/dfs/dn</value>

    </property>

 

    <property>

       = ; <name>dfs.permissions</name>

       = ; <value>false</value>

    </property>

<property>

  <name>dfs.webhdfs.enabled<= ;/name>

  <value>true</value>

</property>

 

<property>

  <name>dfs.http.address</n= ame>

  <value>HOSTNAME:50070</va= lue>

</property>

 

<property>

  <name>dfs.secondary.http.add= ress</name>

  <value>HOSTNAME:50090</va= lue>

</property>

 

<property>

  <name>fs.checkpoint.dir</= name>

  <value>/usr/lib/hadoop/stora= ge/dfs/snn</value>

</property>

 

</configuration>

 

 

/etc/hosts (Note I have also tried by comment= ing 127.0.0.1 entry in host file but the issue was not resolved)

 

127.0.0.1     &n= bsp; localhost

 

IP1    Hostname1 &nbs= p;       # Namenode- vm01 - itself

IP2    Hostname2 &nbs= p;       # DataNode- vm02

=85=85..

 

# The following lines are desirable for I= Pv6 capable hosts

::1     ip6-localhost= ip6-loopback

fe00::0 ip6-localnet

ff00::0 ip6-mcastprefix

ff02::1 ip6-allnodes

ff02::2 ip6-allrouters

 

 

Note : All Hadoop daemons are executing f= ine and the jobs are running properly.

 

How to resolve this issue, I have tried m= any options provided on different forums but still facing the same issue.

I belive that this can cause a major prob= lem later as my edits are not getting rolled into fsimage.. This can cause = me a data loss in case of failure.

 

Please suggest

 

Thanks

Stuti

 

 

 



::DISCLAIMER::
---------------------------------------------------------------------------= -------------------------------------------------------------------------

The contents of this e-mail and any attachment= (s) are confidential and intended for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as informa= tion could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in trans= mission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability = on the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the = author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, disse= mination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written= consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please= delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses = and other defects.

----------------------------------------------= ---------------------------------------------------------------------------= ---------------------------

 

 

 

--_000_8AD4EE147886274A8B495D6AF407DF694B1D0488szxeml510mbxchi_--