hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
Date Wed, 28 Nov 2012 23:23:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506045#comment-13506045

Luke Lu commented on HADOOP-9090:

I don't think you can control what happens to external systems, which _should_ handle arbitrary
connection errors etc by unexpected client (even router/firewall) shutdown. Another problem
of the new patch is that you're creating a latch and a semaphore per record (vs per MetricsBuffer)
and there can be hundreds (up to a few thousands) of records per put. If the sink hangs, you'll
be recreating new thread/latch/semaphore per record and the user perceived timeout would be
configured timeout * number of records. Another issue is that hanging can happen in sink.flush
as well.

Why not do the simple notification in the existing code like the following (untested sketch)?:
boolean oobPut;

// illustration only, should be in the ctor after retry* variables are defined
final long OOB_PUT_TIMEOUT = retryDelay * Math.pow(retryBackoff, retryCount) * 1000;

synchronized void putMetricsImmediate(MetricsBuffer mb) {
  if (!oobPut) {
    oobPut = true;
    if (queue.enqueue(buffer)) {
      oobPut = false;
    } // otherwise queue is full due to sink issues anyway.
  } else { // another oobPut in progress
    oobPut = false; // just in case

// after queue.consumeAll(this); in publishMetricsFromQueue (needs to be synchronized now)
if (oobPut) {

Now you get all the retry/timeout logic for free :)

> Refactor MetricsSystemImpl to allow for an on-demand publish system
> -------------------------------------------------------------------
>                 Key: HADOOP-9090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9090
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>         Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch,
HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch
> We have a need to publish metrics out of some short-living processes, which is not really
well-suited to the current metrics system implementation which periodically publishes metrics
asynchronously (a behavior that works great for long-living processes). Of course I could
write my own metrics system, but it seems like such a waste to rewrite all the awesome code
currently in the MetricsSystemImpl and supporting classes.
> The way I'm proposing to solve this is to:
> 1. Refactor the MetricsSystemImpl class into an abstract base MetricsSystemImpl class
(common configuration and other code) and a concrete PeriodicPublishMetricsSystemImpl class
(timer thread).
> 2. Refactor the MetricsSinkAdapter class into an abstract base MetricsSinkAdapter class
(common configuration and other code) and a concrete AsyncMetricsSinkAdapter class (asynchronous
publishing using the SinkQueue).
> 3. Derive a new simple class OnDemandPublishMetricsSystemImpl from MetricsSystemImpl,
that just exposes a synchronous publish() method to do all the work.
> 4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just synchronously
push metrics to the underlying sink.
> Does that sound reasonable? I'll attach the patch with all this coded up and simple tests
(could use some polish I guess, but wanted to get everyone's opinion first). Notice that this
is somewhat of a breaking change since MetricsSystemImpl is public (although it's marked with
InterfaceAudience.Private); if the breaking change is a problem I could just rename the refactored
classes so that PeriodicPublishMetricsSystemImpl is still called MetricsSystemImpl (and MetricsSystemImpl
-> BaseMetricsSystemImpl).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message