crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steffen Grohsschmiedt (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-589) DistCache should have a configurable replication factor
Date Thu, 21 Jan 2016 12:18:39 GMT
Steffen Grohsschmiedt created CRUNCH-589:
--------------------------------------------

             Summary: DistCache should have a configurable replication factor
                 Key: CRUNCH-589
                 URL: https://issues.apache.org/jira/browse/CRUNCH-589
             Project: Crunch
          Issue Type: Improvement
          Components: Core
            Reporter: Steffen Grohsschmiedt
            Assignee: Josh Wills


We were running into issues with very large jobs where files distributed via the Crunch DistCache
would overload all DataNodes serving the files. The serving DataNodes will run out of Xceiver
threads causing BlockMissingExceptions and the job will fail after some HDFS retries. This
can be fixed by increasing the replication factor for files distributed via DistCache hence
spreading the load across more DataNodes.

I suggest adding a config option for setting a different replication factor but defaulting
to the current behavior of using the default replication factor.

{code}2016-01-19 18:24:45,269 WARN [main] org.apache.hadoop.hdfs.DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-133877431-10.255.1.10-1340216259506:blk_5327751941_1104340730962
file=/tmp/crunch-1412104163/p17/COMBINE
	at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:889)
	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:568)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310)
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
	at org.apache.crunch.util.DistCache.read(DistCache.java:72)
	at org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:46)
	at org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:40)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1651)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1630)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1482)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:720)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:790)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message