Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 09852F2FE for ; Mon, 13 May 2013 04:51:31 +0000 (UTC) Received: (qmail 26142 invoked by uid 500); 13 May 2013 04:51:25 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 24145 invoked by uid 500); 13 May 2013 04:51:19 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 23329 invoked by uid 99); 13 May 2013 04:51:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 May 2013 04:51:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.219.52] (HELO mail-oa0-f52.google.com) (209.85.219.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 May 2013 04:51:10 +0000 Received: by mail-oa0-f52.google.com with SMTP id h1so7121467oag.11 for ; Sun, 12 May 2013 21:50:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=/WxtKdtgb9iVre50Bq/TcHQIBYFDYOa2v8MyfZkjhv0=; b=Hj1eYcGoIrdVczS7jumRpDXAeS9Zew8un9ywHwuxa2+K/vwU5vfqm8JrRptd01TgfZ ci+ocTeqfzvBiSfXwWxU9v+hhCrLJa9iEB/K1QwYb8rOSCeHyXjcbibqUsgK/DIYWrYd zJjhV4itMY/mkBNDAbU3iUUSrJuk563r+qVdEHd5MbFxC7/1494S8GnvV544wC7SWNqz dbuJau/ler+9ILcfRR0uDs9uYriTEl6SaP3mJpyYlcLChE+upG4koMIqaICIFD51rUlb n452dABgHp/i/exZ4zwJtk1zrWVfd6F8M1jRH76GkUe3RxwvH1cDQ1F+O3qH9dHJPfng O9vg== MIME-Version: 1.0 X-Received: by 10.60.37.233 with SMTP id b9mr10100656oek.27.1368420629659; Sun, 12 May 2013 21:50:29 -0700 (PDT) Received: by 10.76.153.73 with HTTP; Sun, 12 May 2013 21:50:29 -0700 (PDT) In-Reply-To: References: Date: Sun, 12 May 2013 21:50:29 -0700 Message-ID: Subject: Re: The minimum memory requirements to datanode and namenode? From: Rishi Yadav To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e013c6a70892bc304dc9241af X-Gm-Message-State: ALoCoQkxbrJhM+bil57KQcJh+KrmLX9gDjwy/4O/frUAlvFFjlJI9KBS9I1WXkRZD8plk+Jk9p8h X-Virus-Checked: Checked by ClamAV on apache.org --089e013c6a70892bc304dc9241af Content-Type: text/plain; charset=ISO-8859-1 can you tell specs of node3. Even on a test/demo cluster, anything below 4 GB ram makes the node almost inaccessible as per my experience. On Sun, May 12, 2013 at 8:25 PM, sam liu wrote: > Got some exceptions on node3: > 1. datanode log: > 2013-04-17 11:13:44,719 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock > blk_2478755809192724446_1477 received exception > java.net.SocketTimeoutException: 63000 millis timeout while waiting for > channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/ > 9.50.102.79:50010] > 2013-04-17 11:13:44,721 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 9.50.102.80:50010, > storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 63000 millis timeout while waiting for > channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/ > 9.50.102.79:50010] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116) > at java.io.DataInputStream.readShort(DataInputStream.java:306) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112) > at java.lang.Thread.run(Thread.java:738) > 2013-04-17 11:13:44,818 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: / > 9.50.102.80:50010 > > > 2. tasktracker log: > 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner: > Deleting user log path job_201304152248_0011 > 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: Caught > exception: java.io.IOException: Call to node1/9.50.102.81:9001 failed on > local exception: java.io.IOException: Connection reset by peer > at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144) > at org.apache.hadoop.ipc.Client.call(Client.java:1112) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source) > at > org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008) > at > org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802) > at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcher.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210) > at sun.nio.ch.IOUtil.read(IOUtil.java:183) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.FilterInputStream.read(FilterInputStream.java:127) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:229) > at java.io.BufferedInputStream.read(BufferedInputStream.java:248) > at java.io.DataInputStream.readInt(DataInputStream.java:381) > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786) > > 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker: > Resending 'status' to 'node1' with reponseId '-12904 > 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker: > SHUTDOWN_MSG: > > > > 2013/5/13 Rishi Yadav > >> do you get any error when trying to connect to cluster, something like >> 'tried n times' or replicated 0 times. >> >> >> >> >> On Sun, May 12, 2013 at 7:28 PM, sam liu wrote: >> >>> Hi, >>> >>> I setup a cluster with 3 nodes, and after that I did not submit any job >>> on it. But, after few days, I found the cluster is unhealthy: >>> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop >>> dfsadmin -report' for a while >>> - The page of 'http://namenode:50070' could not be opened as expected... >>> - ... >>> >>> I did not find any usefull info in the logs, but found the avaible >>> memory of the cluster nodes are very low at that time: >>> - node1(NN,JT,DN,TT): 158 mb mem is available >>> - node2(DN,TT): 75 mb mem is available >>> - node3(DN,TT): 174 mb mem is available >>> >>> I guess the issue of my cluster is caused by lacking of memeory, and my >>> questions are: >>> - Without running jobs, what's the minimum memory requirements to >>> datanode and namenode? >>> - How to define the minimum memeory for datanode and namenode? >>> >>> Thanks! >>> >>> Sam Liu >>> >> >> > --089e013c6a70892bc304dc9241af Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
can you tell specs of node3. Even on a test/demo cluster, = anything below 4 GB ram makes the node almost inaccessible as per my experi= ence.



On Sun, May 12, 2013 at 8:25 PM, sam liu <samliuhadoop@gmail.com&= gt; wrote:
Got some exceptions on node3:
1. datano= de log:
2013-04-17 11:13:44,719 INFO org.apache.hadoop.hdfs.server.datan= ode.DataNode: writeBlock blk_2478755809192724446_1477 received exception ja= va.net.SocketTimeoutException: 63000 millis timeout while waiting for chann= el to be ready for read. ch : java.nio.channels.SocketChannel[connected loc= al=3D/9.50.102.80:58= 371 remote=3D/9.= 50.102.79:50010]
2013-04-17 11:13:44,721 ERROR org.apache.hadoop.hdfs.server.datanode.DataNo= de: DatanodeRegistration(9.50.102.80:50010, storageID=3DDS-2038715921-9.50.102.80-50010-136= 6091297051, infoPort=3D50075, ipcPort=3D50020):DataXceiver
java.net.SocketTimeoutException: 63000 millis timeout while waiting for cha= nnel to be ready for read. ch : java.nio.channels.SocketChannel[connected l= ocal=3D/9.50.102.80:= 58371 remote=3D/= 9.50.102.79:50010]
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.net.SocketIOWithTimeout.doIO(Soc= ketIOWithTimeout.java:164)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.ne= t.SocketInputStream.read(SocketInputStream.java:155)
=A0=A0=A0=A0=A0=A0= =A0 at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:= 128)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.net.SocketInputStream.read(Socke= tInputStream.java:116)
=A0=A0=A0=A0=A0=A0=A0 at java.io.DataInputStream.= readShort(DataInputStream.java:306)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.= hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359) =A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hdfs.server.datanode.DataXceiver= .run(DataXceiver.java:112)
=A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.run= (Thread.java:738)
2013-04-17 11:13:44,818 INFO org.apache.hadoop.hdfs.se= rver.datanode.DataNode: Receiving block blk_8413378381769505032_1477 src: /= 9.50.102.81:35279 dest: /9.50.102.80= :50010


2. tasktracker log:
2013-04-23 11:48:26,783 INFO org.apach= e.hadoop.mapred.UserLogCleaner: Deleting user log path job_201304152248_001= 1
2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: Ca= ught exception: java.io.IOException: Call to node1/9.50.102.81:9001 failed on local exceptio= n: java.io.IOException: Connection reset by peer
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.ipc.Client.wrapException(Client.= java:1144)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.ipc.Client.call(Cl= ient.java:1112)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.ipc.RPC$Invok= er.invoke(RPC.java:229)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapre= d.$Proxy2.heartbeat(Unknown Source)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.TaskTracker.transmitHeart= Beat(TaskTracker.java:2008)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.m= apred.TaskTracker.offerService(TaskTracker.java:1802)
=A0=A0=A0=A0=A0=A0= =A0 at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.TaskTracker.main(TaskTrac= ker.java:3909)
Caused by: java.io.IOException: Connection reset by peer<= br>=A0=A0=A0=A0=A0=A0=A0 at sun.nio.ch.FileDispatcher.read0(Native Method)<= br>=A0=A0=A0=A0=A0=A0=A0 at sun.nio.ch.SocketDispatcher.read(SocketDispatch= er.java:33)
=A0=A0=A0=A0=A0=A0=A0 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java= :210)
=A0=A0=A0=A0=A0=A0=A0 at sun.nio.ch.IOUtil.read(IOUtil.java:183)=A0=A0=A0=A0=A0=A0=A0 at sun.nio.ch.SocketChannelImpl.read(SocketChannelI= mpl.java:257)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.net.SocketInput= Stream$Reader.performIO(SocketInputStream.java:55)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.net.SocketIOWithTimeout.doIO(Soc= ketIOWithTimeout.java:142)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.ne= t.SocketInputStream.read(SocketInputStream.java:155)
=A0=A0=A0=A0=A0=A0= =A0 at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:= 128)
=A0=A0=A0=A0=A0=A0=A0 at java.io.FilterInputStream.read(FilterInputStream.j= ava:127)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.ipc.Client$Connectio= n$PingInputStream.read(Client.java:361)
=A0=A0=A0=A0=A0=A0=A0 at java.io= .BufferedInputStream.fill(BufferedInputStream.java:229)
=A0=A0=A0=A0=A0=A0=A0 at java.io.BufferedInputStream.read(BufferedInputStre= am.java:248)
=A0=A0=A0=A0=A0=A0=A0 at java.io.DataInputStream.readInt(Da= taInputStream.java:381)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.ipc.C= lient$Connection.receiveResponse(Client.java:841)
=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.ipc.Client$Connection.run(Client= .java:786)

2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.Tas= kTracker: Resending 'status' to 'node1' with reponseId '= ;-12904
2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN= _MSG:



2013/5/13 Rishi Yadav <r= ishi@infoobjects.com>
do you get any error when t= rying to connect to cluster, something like 'tried n times' or repl= icated 0 times.




On Sun, May 12, 2013 at 7:28 PM, sam liu= <samliuhadoop@gmail.com> wrote:
Hi,=

I setup a cluster with 3 nodes, and after that I did not subm= it any job on it. But, after few days, I found the cluster is unhealthy:
- No result returned after issuing command 'hadoop dfs -ls /'= or 'hadoop dfsadmin -report' for a while
- The page of &#= 39;http://namenode:5007= 0' could not be opened as expected...
- ...

I did not find any usefull info in the logs, but found t= he avaible memory of the cluster nodes are very low at that time:
= - node1(NN,JT,DN,TT): 158 mb mem is available
- node2(DN,TT): 75 m= b mem is available
- node3(DN,TT): 174 mb mem is available

I guess the issue of m= y cluster is caused by lacking of memeory, and my questions are:
-= Without running jobs, what's the minimum memory requirements to datano= de and namenode?
- How to define the minimum memeory for datanode and namenode?

Thanks!

Sam Liu



--089e013c6a70892bc304dc9241af--