hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-492) Global counters
Date Thu, 15 Feb 2007 19:00:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473474

Doug Cutting commented on HADOOP-492:

That sounds like a great plan.

> Do we need some sort of counter naming convention to prevent future conflicts between
framework-maintained counters and user-defined counters?

We could perhaps piggyback of  Java's naming system by changing the Reporter method to be:

   void incrCounter(Enum key, long amount);

Then, internally, we can convert the key to a String with something like:

   String name = key.getDeclaringClass().getName()+"#"+key.toString();

This serves two purposes: keys are checked at compile time (since they have to be defined
with enums) and they're also package-qualified.

In the web ui, it would be great if all counters, both user and system defined, were displayed
in various forms: raw totals, total rates (counts/second), and per-task averages (average
count/task, average rate/task).

> Global counters
> ---------------
>                 Key: HADOOP-492
>                 URL: https://issues.apache.org/jira/browse/HADOOP-492
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: arkady borkovsky
>         Assigned To: David Bowen
> It would be nice to have map / reduce job keep aggregated counts for arbitrary events
occuring in its tasks -- the numer of records processed, the numer of exceptions of a specific
type, the number of sentences in passive voice, whatever the jobs finds useful.
> This can be implemented by tasks periodically sending <name, value> pairs to the
jobtracker (in some implementations such messages are piggy-backed on the heartbeats), so
that the job tracker stores all the latests values from each task and aggregates them on a
request.  It should also make the aggregated values available at the job end.  The value for
a task would be flushed when the task fails.
> #491 and #490 may be related to this one.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message