hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Lu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
Date Wed, 11 Aug 2010 18:25:20 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897367#action_12897367

Luke Lu commented on MAPREDUCE-1881:

The instrumentation class is related to but not dependent on metrics frameworks. Some of the
events are actually not collected in the regular metrics, so there is an "expert" level config
property "mapreduce.tasktracker.instrumentation" to specify a subclass for TaskTrackerInstrumentation
which contains all the overridable callbacks. The default value for the property is the TaskTrackerMetricsInst
class which currently implements the Updater interface to collect tasktracker metrics in the
"mapred" metrics context. Similarly for metrics v2, TaskTrackerMetricsSource would be the

Matei and others want to use the overridable instrumentation property to hook in other listeners,
for things that're not strictly metrics related, like statusUpdate, which is useful for his
project which does two-level scheduling :) He can achieve this with the addition of the statusUpdate
method in TaskTrackerInstrumentation. To make adding more instrumentation classes (while preserving
the existing instrumentation like metrics) slightly easier (IMO, a user defined composite
class is just as easy), he wants to make the property a list of classes so that the events
are fired for each instances of the specified classes.

The latter part of the patch would add a composite instrumentation class that dispatches all
the events to all the instances of the specified instrumentation classes. Currently the patch
lacks unit tests for the composite class. I can see problems down the road maintaining the
class, like making sure it doesn't block in one of the classes that can potentially do RPCs
etc and properly handle exceptions in the delegate objects. 

> Improve TaskTrackerInstrumentation
> ----------------------------------
>                 Key: MAPREDUCE-1881
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>            Assignee: Matei Zaharia
>            Priority: Minor
>         Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch
> The TaskTrackerInstrumentation class provides a useful way to capture key events at the
TaskTracker for use in various reporting tools, but it is currently rather limited, because
only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives
minimal information about tasks (only their IDs). I propose enhancing the functionality through
two changes:
> # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just
a single one in the JobConf, and report events to all of them.
> # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive
a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful
to make the latter receive the task's final state, i.e. failed, killed, or successful.
> I'm just posting this here to get a sense of whether this is a good idea. If people think
it's okay, I will make a patch against trunk that implements these changes.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message