hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3130) Shuffling takes too long to get the last map output.
Date Tue, 15 Apr 2008 04:47:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588898#action_12588898
] 

Amar Kamat commented on HADOOP-3130:
------------------------------------

bq. A minor point. Since UNIT_CONNECT_TIMEOUT is private final, the following code segment
seems redudant: ...
The reason for doing the check is that  _unit-connect-timeout_ = 0 and _total-timeout_ >
0 will result into infinite loop. Since users can change unit-connect-timeout (and recompile),
I think its safe to guard against such cases and fail early.
bq. Also, you need to test whether the ioe is due to connection timeout. ...
What should be the right behaviour in case of non connection-timeout exceptions? Surely retrying
(w/o any penalty) is not a good option since that will lead to longer waits (may be infinite).

- One way would be to decrement the total-time left (so that the loop termination is guaranteed)
and LOG the type of exception encountered. That is treat it like a connection-timeout exception.
- A bit more complex way would be to discriminate the penalty incurred in each case. For example,
decrement _unit-connect-timeout/2_ in case of non connect-timeout exceptions and decrement
_unit-connect-timeout_ otherwise.
- Another more complex way would be to tolerate some failures (w/o penalty) for the non-connect-timeout
exceptions. 
----
For now I think its okay to keep it simple.  Note that the reducer will not get killed if
one meta-connect attempt fails, it requires a bunch of them.

> Shuffling takes too long to get the last map output.
> ----------------------------------------------------
>
>                 Key: HADOOP-3130
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3130
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3130-v2.patch, HADOOP-3130-v2.patch, HADOOP-3130-v3.1.patch,
HADOOP-3130-v3.patch, HADOOP-3130.patch, shuffling.log
>
>
> I noticed that towards the end of shufflling, the map output fetcher of the reducer backs
off too aggressively.
> I attach a fraction of one reduce log of my job.
> Noticed that the last map output was not fetched in 2 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message