hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Bowen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-492) Global counters
Date Thu, 15 Feb 2007 18:23:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473458
] 

David Bowen commented on HADOOP-492:
------------------------------------


This requirement is not an exact match with the Metrics API.  A MetricsRecord has a number
of capabilities that aren't relevant here:
   * gauges as well as counters
   * adding any number of tags to the data to support various ways of aggregating it
   * atomic update of multiple metrics
   * removing metrics

So I don't think it makes sense to expose any aspect of the Metrics API here.  We can simply
add one method to Reporter:

   void incrCounter(String name, long amount);

Behind the scenes, we can automatically send this data to the Metrics API with appropriate
tags, as well as aggregating it into the TaskStatus and JobStatus objects so that it is accessible
via JobClient.

We would have some counters that are maintained by the framework.  Currently, these would
be:

   shuffle_input_bytes
   map_input_records
   map_input_bytes
   map_output_records
   map_output_bytes
   reduce_input_records
   reduce_output_records

Do we need some sort of counter naming convention to prevent future conflicts between framework-maintained
counters and user-defined counters?





> Global counters
> ---------------
>
>                 Key: HADOOP-492
>                 URL: https://issues.apache.org/jira/browse/HADOOP-492
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: arkady borkovsky
>         Assigned To: David Bowen
>
> It would be nice to have map / reduce job keep aggregated counts for arbitrary events
occuring in its tasks -- the numer of records processed, the numer of exceptions of a specific
type, the number of sentences in passive voice, whatever the jobs finds useful.
> This can be implemented by tasks periodically sending <name, value> pairs to the
jobtracker (in some implementations such messages are piggy-backed on the heartbeats), so
that the job tracker stores all the latests values from each task and aggregates them on a
request.  It should also make the aggregated values available at the job end.  The value for
a task would be flushed when the task fails.
> #491 and #490 may be related to this one.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message