hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Elhemali (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
Date Wed, 28 Nov 2012 21:23:58 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mostafa Elhemali updated HADOOP-9090:
-------------------------------------

    Attachment: HADOOP-9090.justEnhanceDefaultImpl.3.patch

Thanks Luke. It's a fundamentally hard problem: if we assume that putMetrics() from arbitrary
sinks can hang, then we'll have to interrupt the thread that doing the work and hope that
unsticks it (that's how thread pools cancel work for example). Having said that, I got uneasy
about using stop(), and I agree with your comment that I was creating too many threads willy-nilly.
The new patch thus does two things:
1. Creates a single thread (if needed) for on-demand publishing per sink, and reuses that
for subsequent publishing. This should cut down considerably on how many threads are created.
2. If it sees a sink timeout, it interrupts the worker thread but doesn't stop() it (joins
it and hangs with it if needed).

I personally think that's the most reliable thing we can do. One alternative as you say is
to mark the worker thread as a daemon, so we don't have to join it and hang with it if it
hangs, but just abandon it. The problem there is that sinks may be doing things to external
systems that will leave them in inconsistent states if the process just shuts down while they're
still working with no warning. Let me know what you think.
                
> Refactor MetricsSystemImpl to allow for an on-demand publish system
> -------------------------------------------------------------------
>
>                 Key: HADOOP-9090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9090
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>         Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch,
HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch
>
>
> We have a need to publish metrics out of some short-living processes, which is not really
well-suited to the current metrics system implementation which periodically publishes metrics
asynchronously (a behavior that works great for long-living processes). Of course I could
write my own metrics system, but it seems like such a waste to rewrite all the awesome code
currently in the MetricsSystemImpl and supporting classes.
> The way I'm proposing to solve this is to:
> 1. Refactor the MetricsSystemImpl class into an abstract base MetricsSystemImpl class
(common configuration and other code) and a concrete PeriodicPublishMetricsSystemImpl class
(timer thread).
> 2. Refactor the MetricsSinkAdapter class into an abstract base MetricsSinkAdapter class
(common configuration and other code) and a concrete AsyncMetricsSinkAdapter class (asynchronous
publishing using the SinkQueue).
> 3. Derive a new simple class OnDemandPublishMetricsSystemImpl from MetricsSystemImpl,
that just exposes a synchronous publish() method to do all the work.
> 4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just synchronously
push metrics to the underlying sink.
> Does that sound reasonable? I'll attach the patch with all this coded up and simple tests
(could use some polish I guess, but wanted to get everyone's opinion first). Notice that this
is somewhat of a breaking change since MetricsSystemImpl is public (although it's marked with
InterfaceAudience.Private); if the breaking change is a problem I could just rename the refactored
classes so that PeriodicPublishMetricsSystemImpl is still called MetricsSystemImpl (and MetricsSystemImpl
-> BaseMetricsSystemImpl).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message