hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1276) Shuffle connection logic needs correction
Date Wed, 05 May 2010 11:08:04 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864270#action_12864270

Amareshwari Sriramadasu commented on MAPREDUCE-1276:

I repeated the manual testing described in my earlier comment.
I tried to simulate read timeout for m_00001_0 by explicitly adding a sleep in TaskTracker.MapOutputServlet.sendMapFile().
The attempt fails with error "Too many fetch failures" as expected. 
But most of the times I see m_00002_0 also failing with following error:
Map output lost, rescheduling: error on sending map attempt_201005051443_0003_m_000002_0 to
reduce 1
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787) 
at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:566) 
at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946) 
at java.io.DataOutputStream.flush(DataOutputStream.java:106) 
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.sendMapFile(TaskTracker.java:3646)

at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3517) 
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) 
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) 
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) 
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)

at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:766) at
Jothi/Chris, do you think this is an agreeable failure? 
I think we should catch this as not an inputException and do a retry.

> Shuffle connection logic needs correction 
> ------------------------------------------
>                 Key: MAPREDUCE-1276
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1276
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.21.0
>         Attachments: patch-1276-1.txt, patch-1276.txt
> While looking at the code with Amareshwari, we realized that  {{Fetcher#copyFromHost}}
marks connection as successful when {{url.openConnection}} returns. This is wrong. Connection
is done inside implicitly inside {{getInputStream}}; we need to split {{getInputStream}} into
{{connect}} and {{getInputStream}} to handle the connection and read time out strategies correctly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message