hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3130) Shuffling takes too long to get the last map output.
Date Sun, 30 Mar 2008 08:37:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583421#action_12583421

Devaraj Das commented on HADOOP-3130:

The events are stored in the jobtracker and fetched by the tasktrackers. This frequency of
polling for map completion events is same as the heartbeat-interval (which depends on the
cluster size). For e.g., if cluster size is of 500 nodes it is going to be 10 seconds. Now
the reason for the order of minutes delay in getting map completion events could be that the
map is not complete yet (it's still in COMMIT_PENDING or RUNNING), or, the JobTracker is busy
and is discarding RPCs. To ascertain the latter, you should take a look at the reducer's host
tasktracker logs.

> Shuffling takes too long to get the last map output.
> ----------------------------------------------------
>                 Key: HADOOP-3130
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3130
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Runping Qi
>         Attachments: shuffling.log
> I noticed that towards the end of shufflling, the map output fetcher of the reducer backs
off too aggressively.
> I attach a fraction of one reduce log of my job.
> Noticed that the last map output was not fetched in 2 minutes.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message