Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <30153863.1171562945829.JavaMail.jira@brutus>
Date: Thu, 15 Feb 2007 10:09:05 -0800 (PST)
From: "Doug Cutting (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-492) Global counters
In-Reply-To: <5641100.1156887324032.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473451 ] 

Doug Cutting commented on HADOOP-492:
-------------------------------------

I talked to Owen about this last week.  My concerns are:

1. We should only instrument code once, for counters and for monitoring metrics.

2. Users should be able to easily add new counters & metrics to their code that are visible in the JobTracker web ui and/or a separate metrics monitoring system.

3. Counters should be accessible programatically through JobClient.

One way to implement this would be to implement counters through the metrics API, as I've promoted above.  Another approach would be to add a new counter-only API (a subset of metrics features) that routes values to the jobtracker, and can also be configured to talk to the metrics system.  Then user code can decide whether to use the metrics API directly (for non-counter metrics) or use the counter-only API, and get the benefit of the JobTracker-based aggregation, built into the MapReduce runtime.  I don't have a strong preference about which implementation strategy is pursued.

> Global counters
> ---------------
>
>                 Key: HADOOP-492
>                 URL: https://issues.apache.org/jira/browse/HADOOP-492
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: arkady borkovsky
>         Assigned To: David Bowen
>
> It would be nice to have map / reduce job keep aggregated counts for arbitrary events occuring in its tasks -- the numer of records processed, the numer of exceptions of a specific type, the number of sentences in passive voice, whatever the jobs finds useful.
> This can be implemented by tasks periodically sending <name, value> pairs to the jobtracker (in some implementations such messages are piggy-backed on the heartbeats), so that the job tracker stores all the latests values from each task and aggregates them on a request.  It should also make the aggregated values available at the job end.  The value for a task would be flushed when the task fails.
> #491 and #490 may be related to this one.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.