Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ACE4918D5A for ; Fri, 22 Jan 2016 02:48:40 +0000 (UTC) Received: (qmail 9371 invoked by uid 500); 22 Jan 2016 02:48:40 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 9320 invoked by uid 500); 22 Jan 2016 02:48:40 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 9295 invoked by uid 500); 22 Jan 2016 02:48:40 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 9290 invoked by uid 99); 22 Jan 2016 02:48:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Jan 2016 02:48:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id CE8BE2C1F57 for ; Fri, 22 Jan 2016 02:48:39 +0000 (UTC) Date: Fri, 22 Jan 2016 02:48:39 +0000 (UTC) From: "Micah Whitacre (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (CRUNCH-589) DistCache should have a configurable replication factor MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Whitacre reassigned CRUNCH-589: ------------------------------------- Assignee: Micah Whitacre (was: Josh Wills) > DistCache should have a configurable replication factor > ------------------------------------------------------- > > Key: CRUNCH-589 > URL: https://issues.apache.org/jira/browse/CRUNCH-589 > Project: Crunch > Issue Type: Improvement > Components: Core > Reporter: Steffen Grohsschmiedt > Assignee: Micah Whitacre > Fix For: 0.14.0 > > Attachments: CRUNCH-589.patch > > > We were running into issues with very large jobs where files distributed via the Crunch DistCache would overload all DataNodes serving the files. The serving DataNodes will run out of Xceiver threads causing BlockMissingExceptions and the job will fail after some HDFS retries. This can be fixed by increasing the replication factor for files distributed via DistCache hence spreading the load across more DataNodes. > I suggest adding a config option for setting a different replication factor but defaulting to the current behavior of using the default replication factor. > {code}2016-01-19 18:24:45,269 WARN [main] org.apache.hadoop.hdfs.DFSClient: DFS Read > org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-133877431-10.255.1.10-1340216259506:blk_5327751941_1104340730962 file=/tmp/crunch-1412104163/p17/COMBINE > at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:889) > at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:568) > at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848) > at java.io.DataInputStream.read(DataInputStream.java:149) > at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310) > at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323) > at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794) > at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801) > at java.io.ObjectInputStream.(ObjectInputStream.java:299) > at org.apache.crunch.util.DistCache.read(DistCache.java:72) > at org.apache.crunch.impl.mr.run.CrunchTaskContext.(CrunchTaskContext.java:46) > at org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:40) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168) > at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1651) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1630) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1482) > at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:720) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:790) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)