crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (CRUNCH-589) DistCache should have a configurable replication factor
Date Fri, 22 Jan 2016 02:48:39 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Micah Whitacre reassigned CRUNCH-589:
-------------------------------------

    Assignee: Micah Whitacre  (was: Josh Wills)

> DistCache should have a configurable replication factor
> -------------------------------------------------------
>
>                 Key: CRUNCH-589
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-589
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Steffen Grohsschmiedt
>            Assignee: Micah Whitacre
>             Fix For: 0.14.0
>
>         Attachments: CRUNCH-589.patch
>
>
> We were running into issues with very large jobs where files distributed via the Crunch
DistCache would overload all DataNodes serving the files. The serving DataNodes will run out
of Xceiver threads causing BlockMissingExceptions and the job will fail after some HDFS retries.
This can be fixed by increasing the replication factor for files distributed via DistCache
hence spreading the load across more DataNodes.
> I suggest adding a config option for setting a different replication factor but defaulting
to the current behavior of using the default replication factor.
> {code}2016-01-19 18:24:45,269 WARN [main] org.apache.hadoop.hdfs.DFSClient: DFS Read
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-133877431-10.255.1.10-1340216259506:blk_5327751941_1104340730962
file=/tmp/crunch-1412104163/p17/COMBINE
> 	at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:889)
> 	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:568)
> 	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
> 	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848)
> 	at java.io.DataInputStream.read(DataInputStream.java:149)
> 	at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310)
> 	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323)
> 	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
> 	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
> 	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
> 	at org.apache.crunch.util.DistCache.read(DistCache.java:72)
> 	at org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:46)
> 	at org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:40)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
> 	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1651)
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1630)
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1482)
> 	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:720)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:790)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message