Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D1001D177 for ; Wed, 28 Nov 2012 21:23:58 +0000 (UTC) Received: (qmail 84669 invoked by uid 500); 28 Nov 2012 21:23:58 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 84590 invoked by uid 500); 28 Nov 2012 21:23:58 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 84390 invoked by uid 99); 28 Nov 2012 21:23:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2012 21:23:58 +0000 Date: Wed, 28 Nov 2012 21:23:58 +0000 (UTC) From: "Mostafa Elhemali (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <676149085.35410.1354137838271.JavaMail.jiratomcat@arcas> In-Reply-To: <496028161.21534.1353875698633.JavaMail.jiratomcat@arcas> Subject: [jira] [Updated] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Elhemali updated HADOOP-9090: ------------------------------------- Attachment: HADOOP-9090.justEnhanceDefaultImpl.3.patch Thanks Luke. It's a fundamentally hard problem: if we assume that putMetrics() from arbitrary sinks can hang, then we'll have to interrupt the thread that doing the work and hope that unsticks it (that's how thread pools cancel work for example). Having said that, I got uneasy about using stop(), and I agree with your comment that I was creating too many threads willy-nilly. The new patch thus does two things: 1. Creates a single thread (if needed) for on-demand publishing per sink, and reuses that for subsequent publishing. This should cut down considerably on how many threads are created. 2. If it sees a sink timeout, it interrupts the worker thread but doesn't stop() it (joins it and hangs with it if needed). I personally think that's the most reliable thing we can do. One alternative as you say is to mark the worker thread as a daemon, so we don't have to join it and hang with it if it hangs, but just abandon it. The problem there is that sinks may be doing things to external systems that will leave them in inconsistent states if the process just shuts down while they're still working with no warning. Let me know what you think. > Refactor MetricsSystemImpl to allow for an on-demand publish system > ------------------------------------------------------------------- > > Key: HADOOP-9090 > URL: https://issues.apache.org/jira/browse/HADOOP-9090 > Project: Hadoop Common > Issue Type: New Feature > Components: metrics > Reporter: Mostafa Elhemali > Priority: Minor > Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch, HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch > > > We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. > The way I'm proposing to solve this is to: > 1. Refactor the MetricsSystemImpl class into an abstract base MetricsSystemImpl class (common configuration and other code) and a concrete PeriodicPublishMetricsSystemImpl class (timer thread). > 2. Refactor the MetricsSinkAdapter class into an abstract base MetricsSinkAdapter class (common configuration and other code) and a concrete AsyncMetricsSinkAdapter class (asynchronous publishing using the SinkQueue). > 3. Derive a new simple class OnDemandPublishMetricsSystemImpl from MetricsSystemImpl, that just exposes a synchronous publish() method to do all the work. > 4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just synchronously push metrics to the underlying sink. > Does that sound reasonable? I'll attach the patch with all this coded up and simple tests (could use some polish I guess, but wanted to get everyone's opinion first). Notice that this is somewhat of a breaking change since MetricsSystemImpl is public (although it's marked with InterfaceAudience.Private); if the breaking change is a problem I could just rename the refactored classes so that PeriodicPublishMetricsSystemImpl is still called MetricsSystemImpl (and MetricsSystemImpl -> BaseMetricsSystemImpl). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira