hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1485) Metrics should be there for reporting shuffle failures/successes
Date Mon, 25 Jun 2007 09:11:26 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Devaraj Das updated HADOOP-1485:
--------------------------------

    Attachment: 1485.1.patch

Thanks David for the reiew. Looks like I had made a couple of careless copy/paste errors in
my previous patch. This patch fixes all those and the other issues pointed out, and is also
up-to-date with the trunk.
I forgot to mention the last time the metrics that I added for the shuffle phase.
The shuffle metrics is given out by the TaskTracker and the ReduceTask. 
The TaskTracker side is handled by a class called ShuffleServerMetrics and it reports the
following metrics:
   (a) shuffle_handler_busy_percent  [this tells us how busy the servlet handler is] 
   (b) shuffle_output_bytes [the number of map output bytes read from map output files]
   (c) shuffle_failed_outputs [the number of map output sends that failed] 
   (d) shuffle_success_outputs [the number of map output sends that succeeded from the server's
point of view]
   These metrics are tagged with the "sessionId" (there is little to gain by tagging them
with something like "user" since the tasktracker can potentially serve outputs for maps belonging
to different-jobs/different-users concurrently).

The ReduceTask side is handled by a class called ShuffleClientMetrics and it reports the following
metrics:
   (a) shuffle_fetchers_busy_percent [this tells us how busy the map output copier subsystem
is]
   (b) shuffle_input_bytes [the number of map output bytes read off the wire]
   (c) shuffle_failed_fetches [the number of failed fetches]
   (d) shuffle_success_fetches [the number of successful fetches]
   These metrics are tagged with "user", "jobName", "jobId", "taskId", "sessionId".

> Metrics should be there for reporting shuffle failures/successes
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.14.0
>
>         Attachments: 1485.1.patch, 1485.1.patch, shuffle-metrics.patch
>
>
> It would be nice to have metrics for the shuffle phase which reports the failures/successes
for the fetches. This would aid in performance tests and in debugging (shuffle).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message