Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC1871929C for ; Sat, 9 Apr 2016 00:04:26 +0000 (UTC) Received: (qmail 55308 invoked by uid 500); 9 Apr 2016 00:04:25 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 55215 invoked by uid 500); 9 Apr 2016 00:04:25 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 54964 invoked by uid 99); 9 Apr 2016 00:04:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Apr 2016 00:04:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 853332C1F62 for ; Sat, 9 Apr 2016 00:04:25 +0000 (UTC) Date: Sat, 9 Apr 2016 00:04:25 +0000 (UTC) From: "Eric Payne (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233181#comment-15233181 ] Eric Payne commented on MAPREDUCE-6633: --------------------------------------- {quote} In this case the decompressor threw RuntimeException (ArrayIndexOutOfBondsException is a subclass). If we had re run the map on another node, the job would have succeeded. ... I understand your concern but I think its a good change according to me. {quote} Thanks [~shahrs87]]. It would be ideal to come up with a subset that would cover only the exceptions that could be thrown, but I agree that the change is fine as it is. +1 > AM should retry map attempts if the reduce task encounters commpression related errors. > --------------------------------------------------------------------------------------- > > Key: MAPREDUCE-6633 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.7.2 > Reporter: Rushabh S Shah > Assignee: Rushabh S Shah > Attachments: MAPREDUCE-6633.patch > > > When reduce task encounters compression related errors, AM doesn't retry the corresponding map task. > In one of the case we encountered, here is the stack trace. > {noformat} > 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#29 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.ArrayIndexOutOfBoundsException > at com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) > at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) > at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > {noformat} > In this case, the node on which the map task ran had a bad drive. > If the AM had retried running that map task somewhere else, the job definitely would have succeeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)