hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1276) Shuffle connection logic needs correction
Date Tue, 04 May 2010 21:37:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864019#action_12864019
] 

Chris Douglas commented on MAPREDUCE-1276:
------------------------------------------

Oh, OK; I didn't know that. While this prevents the connection from being reused, it will
still be garbage collected if it is simply closed, right? The current code prevents another
Fetcher from fetching output from that host while the socket is "drained," which seems like
an avoidable cost. Leaving the TCP connection open is regrettable... thoughts on this tradeoff?

* If retained, would it be possible to update the comment on the drain code? The current explanation,
"just in case", makes it sound like superstition instead of a deliberate choice
* Since {{input.close()}} can also throw, would it make sense to do that within the try/catch
or use {{IOUtils::cleanup}}?

> Shuffle connection logic needs correction 
> ------------------------------------------
>
>                 Key: MAPREDUCE-1276
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1276
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: patch-1276.txt
>
>
> While looking at the code with Amareshwari, we realized that  {{Fetcher#copyFromHost}}
marks connection as successful when {{url.openConnection}} returns. This is wrong. Connection
is done inside implicitly inside {{getInputStream}}; we need to split {{getInputStream}} into
{{connect}} and {{getInputStream}} to handle the connection and read time out strategies correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message