hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shashwat shriparv <dwivedishash...@gmail.com>
Subject Re: The minimum memory requirements to datanode and namenode?
Date Mon, 13 May 2013 08:28:45 GMT
Due to Small amount of memory available to the nodes they are not able to
send response in time, and socket connection exception, and there may be
some network issue to.

Please check which program is using memory? as there will be some other
cohosted application eating up the memory.

ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS

or

give top command then press shift+M
anc then c
and check application is eating up the memory.

there must be apmple memory available to the nodes beside the reserved for
JVM

*Thanks & Regards    *

∞
Shashwat Shriparv



On Mon, May 13, 2013 at 12:23 PM, Nitin Pawar <nitinpawar432@gmail.com>wrote:

> 4GB memory on NN? this will run out of memory in few days.
>
> You will need to make sure your NN has atleast more than double RAM of
> your DNs if you have a miniature  cluster.
>
>
> On Mon, May 13, 2013 at 11:52 AM, sam liu <samliuhadoop@gmail.com> wrote:
>
>> I can issue a command 'hadoop dfsadmin -report', but it did not return
>> any result for a long time. Also, I can open the NN UI(
>> http://namenode:50070), but it is always keeping in the connecting
>> status, and could not return any cluster statistic.
>>
>> The mem of NN:
>>                   total       used       free
>> Mem:          3834       3686        148
>>
>> After running a top command, I can see following process are taking up
>> the memory: namenode, jobtracker, tasktracker, hbase, ...
>>
>> I can restart the cluster, and then the cluster will be healthy. But this
>> issue will probably occur in a few days later. I think it's caused by
>> lacking of free/available mem, but do not know how many extra
>> free/available mem of node is required, besides the necessary mem for
>> running datanode/tasktracker process?
>>
>>
>>
>>
>> 2013/5/13 Nitin Pawar <nitinpawar432@gmail.com>
>>
>>> just one node not having memory does not mean your cluster is down.
>>>
>>> Can you see your hdfs health on NN UI?
>>>
>>> how much memory do you have on NN? if there are no jobs running on the
>>> cluster then you can safely restart datanode and tasktracker.
>>>
>>> Also run a top command and figure out which processes are taking up the
>>> memory and for what purpose?
>>>
>>>
>>> On Mon, May 13, 2013 at 11:28 AM, sam liu <samliuhadoop@gmail.com>wrote:
>>>
>>>> Nitin,
>>>>
>>>> In my cluster, the tasktracker and datanode already have been launched,
>>>> and are still running now. But the free/available mem of node3 now is just
>>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>>> does not return result of command 'hadoop dfs -ls /')?
>>>>
>>>>
>>>> 2013/5/13 Nitin Pawar <nitinpawar432@gmail.com>
>>>>
>>>>> Sam,
>>>>>
>>>>> There is no formula for determining how much memory one should give to
>>>>> datanode and tasktracker. Ther formula is available for how many slots
you
>>>>> want to have on a machine.
>>>>>
>>>>> In my prior experience, we did give 512MB memory each to a datanode
>>>>> and tasktracker.
>>>>>
>>>>>
>>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <samliuhadoop@gmail.com>wrote:
>>>>>
>>>>>> For node3, the memory is:
>>>>>>                    total       used       free     shared
>>>>>> buffers     cached
>>>>>> Mem:          3834       3666        167          0        187
>>>>>> 1136
>>>>>> -/+ buffers/cache:       2342       1491
>>>>>> Swap:         8196          0       8196
>>>>>>
>>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>>> free/available memory for the datanode process and tasktracker process,
>>>>>> without running any map/reduce task?
>>>>>> Any formula to determine it?
>>>>>>
>>>>>>
>>>>>> 2013/5/13 Rishi Yadav <rishi@infoobjects.com>
>>>>>>
>>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <samliuhadoop@gmail.com>wrote:
>>>>>>>
>>>>>>>> Got some exceptions on node3:
>>>>>>>> 1. datanode log:
>>>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>>>> blk_2478755809192724446_1477 received exception
>>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while
waiting for
>>>>>>>> channel to be ready for read. ch :
>>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>>> 9.50.102.79:50010]
>>>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>>>> 9.50.102.80:50010,
>>>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051,
infoPort=50075,
>>>>>>>> ipcPort=50020):DataXceiver
>>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while
waiting
>>>>>>>> for channel to be ready for read. ch :
>>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>>> 9.50.102.79:50010]
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>>>         at
>>>>>>>> java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
block
>>>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest:
/
>>>>>>>> 9.50.102.80:50010
>>>>>>>>
>>>>>>>>
>>>>>>>> 2. tasktracker log:
>>>>>>>> 2013-04-23 11:48:26,783 INFO
>>>>>>>> org.apache.hadoop.mapred.UserLogCleaner: Deleting user log
path
>>>>>>>> job_201304152248_0011
>>>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Caught exception: java.io.IOException: Call to node1/
>>>>>>>> 9.50.102.81:9001 failed on local exception: java.io.IOException:
>>>>>>>> Connection reset by peer
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown
>>>>>>>> Source)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>>>         at
>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>>>         at
>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>>         at
>>>>>>>> java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>>>         at
>>>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>>>         at
>>>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>>>
>>>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> SHUTDOWN_MSG:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/5/13 Rishi Yadav <rishi@infoobjects.com>
>>>>>>>>
>>>>>>>>> do you get any error when trying to connect to cluster,
something
>>>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <samliuhadoop@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I setup a cluster with 3 nodes, and after that I
did not submit
>>>>>>>>>> any job on it. But, after few days, I found the cluster
is unhealthy:
>>>>>>>>>> - No result returned after issuing command 'hadoop
dfs -ls /' or
>>>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>>>> - The page of 'http://namenode:50070' could not be
opened as
>>>>>>>>>> expected...
>>>>>>>>>> - ...
>>>>>>>>>>
>>>>>>>>>> I did not find any usefull info in the logs, but
found the
>>>>>>>>>> avaible memory of the cluster nodes are very low
at that time:
>>>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>>>
>>>>>>>>>> I guess the issue of my cluster is caused by lacking
of memeory,
>>>>>>>>>> and my questions are:
>>>>>>>>>> - Without running jobs, what's the minimum memory
requirements to
>>>>>>>>>> datanode and namenode?
>>>>>>>>>> - How to define the minimum memeory for datanode
and namenode?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Sam Liu
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Mime
View raw message