hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <ja...@attributor.com>
Subject question on Hadoop configuration for non cpu intensive jobs - 0.15.1
Date Tue, 25 Dec 2007 21:52:35 GMT
We have two flavors of jobs we run through hadoop, the first flavor is a 
simple merge sort, where there is very little happening in the mapper or 
the reducer.
The second flavor are very compute intensive.

In the first type, our each map task consumes its (default sized) 64meg 
input split in a small number of seconds, resulting quite a bit of the 
elapsed time being spent in job setup and shutdown.

We have tried reducing the number of splits by increasing the block 
sizes to 10x and 5x 64meg, but then we constantly have out of memory 
errors and timeouts. At this point each jvm is getting 768M and I can't 
readily allocate more without dipping into swap.

What suggestions do people have for this case?

07/12/25 11:49:59 INFO mapred.JobClient: Task Id : 
task_200712251146_0001_m_000002_0, Status : FAILED
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
        at 
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
        at 
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1763)
        at 
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1663)
        at 
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1709)
        at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
        at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:174)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

07/12/25 11:51:35 INFO mapred.JobClient: Task Id : 
task_200712251146_0001_r_000038_0, Status : FAILED
java.net.SocketTimeoutException: timed out waiting for rpc response
        at org.apache.hadoop.ipc.Client.call(Client.java:484)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
        at org.apache.hadoop.dfs.$Proxy1.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:269)
        at 
org.apache.hadoop.dfs.DFSClient.createNamenode(DFSClient.java:147)
        at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:161)
        at 
org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:65)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159)
        at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:118)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1759)


Mime
View raw message