hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1182) DFS Scalability issue with filecache in large clusters
Date Fri, 30 Mar 2007 17:32:25 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485576
] 

dhruba borthakur commented on HADOOP-1182:
------------------------------------------

There are a few open issues that deal with reducing CPU usage on namenode. Hadoop-1155, hadoop-1149,
hadoop-1079 and hadoop-1073. Some of these patches should improve your situation.

In the short-term,you could try increasing the number of Namenode threads. This, in turn,
increases the call queue depth (100 calls per each additional server threads). The default
number of server threads is 40. To make this change, you have to add the following to hadoop-site.xml:

 <property>
  <name>dfs.namenode.handler.count</name>
  <value>40</value>
  <description>The number of server threads for the namenode.</description>
</property>


> DFS Scalability issue with filecache in large clusters
> ------------------------------------------------------
>
>                 Key: HADOOP-1182
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1182
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Christian Kunz
>
> When using filecache to distribute supporting files for map/reduce applications in a
1000 node cluster, many map tasks fail  because of timeouts. There was no such problem using
a 200 node cluster for the same applications with comparable input data. Either the whole
job fails because of too many map failures, or even worse, some map tasks hang indefinitely.
> java.net.SocketTimeoutException: timed out waiting for rpc response
> 	at org.apache.hadoop.ipc.Client.call(Client.java:473)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
> 	at org.apache.hadoop.dfs.$Proxy1.exists(Unknown Source)
> 	at org.apache.hadoop.dfs.DFSClient.exists(DFSClient.java:320)
> 	at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.exists(DistributedFileSystem.java:170)
> 	at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.open(DistributedFileSystem.java:125)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.(ChecksumFileSystem.java:110)
> 	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:330)
> 	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:245)
> 	at org.apache.hadoop.filecache.DistributedCache.createMD5(DistributedCache.java:327)
> 	at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:253)
> 	at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:169)
> 	at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:86)
> 	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:117)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message