hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6303) Read timeout when retrying a fetch error can be fatal to a reducer
Date Fri, 03 Apr 2015 14:47:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394523#comment-14394523
] 

Hudson commented on MAPREDUCE-6303:
-----------------------------------

FAILURE: Integrated in Hadoop-Hdfs-trunk #2084 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2084/])
MAPREDUCE-6303. Read timeout when retrying a fetch error can be fatal to a reducer. Contributed
by Jason Lowe. (junping_du: rev eccb7d46efbf07abcc6a01bd5e7d682f6815b824)
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
* hadoop-mapreduce-project/CHANGES.txt
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java


> Read timeout when retrying a fetch error can be fatal to a reducer
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6303
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>             Fix For: 2.7.0
>
>         Attachments: MAPREDUCE-6303.001.patch
>
>
> If a reducer encounters an error trying to fetch from a node then encounters a read timeout
when trying to re-establish the connection then the reducer can fail.  The read timeout exception
can leak to the top of the Fetcher thread which will cause the reduce task to teardown.  This
type of error can repeat across reducer attempts causing jobs to fail due to a single bad
node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message