hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Kunz (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1182) scalability issue with filecache in large clusters
Date Thu, 29 Mar 2007 04:31:25 GMT
scalability issue with filecache in large clusters
--------------------------------------------------

                 Key: HADOOP-1182
                 URL: https://issues.apache.org/jira/browse/HADOOP-1182
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.1
            Reporter: Christian Kunz


When using filecache to distribute supporting files for map/reduce applications in a 1000
node cluster, many map tasks fail  because of timeouts. There was no such problem using a
200 node cluster for the same applications with comparable input data. Either the whole job
fails because of too many map failures, or even worse, some map tasks hang indefinitely.


java.net.SocketTimeoutException: timed out waiting for rpc response
	at org.apache.hadoop.ipc.Client.call(Client.java:473)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
	at org.apache.hadoop.dfs.$Proxy1.exists(Unknown Source)
	at org.apache.hadoop.dfs.DFSClient.exists(DFSClient.java:320)
	at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.exists(DistributedFileSystem.java:170)
	at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.open(DistributedFileSystem.java:125)
	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.(ChecksumFileSystem.java:110)
	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:330)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:245)
	at org.apache.hadoop.filecache.DistributedCache.createMD5(DistributedCache.java:327)
	at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:253)
	at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:169)
	at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:86)
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:117)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message