Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DEA8918186 for ; Thu, 20 Aug 2015 10:34:29 +0000 (UTC) Received: (qmail 13330 invoked by uid 500); 20 Aug 2015 10:34:29 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 13290 invoked by uid 500); 20 Aug 2015 10:34:29 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 13280 invoked by uid 99); 20 Aug 2015 10:34:29 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2015 10:34:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 4F7D4C0332 for ; Thu, 20 Aug 2015 10:34:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.75 X-Spam-Level: * X-Spam-Status: No, score=1.75 tagged_above=-999 required=6.31 tests=[KAM_INFOUSMEBIZ=0.75, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H2=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id XyX17oo7hP8c for ; Thu, 20 Aug 2015 10:34:22 +0000 (UTC) Received: from smtp.webfaction.com (mail6.webfaction.com [74.55.86.74]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTP id 6809B2527E for ; Thu, 20 Aug 2015 10:34:22 +0000 (UTC) Received: from webmail.webfaction.com (wf5.webfaction.com [75.126.149.3]) by smtp.webfaction.com (Postfix) with ESMTP id 55752211601A for ; Thu, 20 Aug 2015 10:34:16 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Thu, 20 Aug 2015 12:34:16 +0200 From: =?UTF-8?Q?Cl=C3=A9ment_MATHIEU?= To: user@crunch.apache.org Subject: Crunch API to run code at JVM startup / shutdown Message-ID: <5ad34c2c46c9f1c110a6c7520e0af502@webmail.webfaction.com> X-Sender: clement@unportant.info User-Agent: Roundcube Webmail/1.1.2 Hi, I am trying to setup something to automatically profile my Crunch jobs on an Hadoop cluster. I have been a long time user of hprof & "mapred.task.profile" because it is so easy to use on Hadoop. However, I am now moving away from it: - will be removed from Java 9 - suffers from safe point bias - does not allow to profile native code - gathering other metrics than stack trace samples can be useful I had like to replace hprof by Flight Recorder and/or perf. Unlike hprof, both need to be started and stopped programmatically since there is not glue for them in Hadoop. I can see three options: 1. Hack the app It can be done using DoFn.initialize/cleanup. Or all DoFns invoke the same idempotent code, or dedicated DoFns are inserted at specific points. Both seems horrific and disgusting :) 2. Java agent Profiling is not tied to Crunch and any tool can be profiled. Main drawbacks are that the agent must be deployed on all the nodes and that it does not have easy access to metadata like user, job name, stage etc. A good example of such agent is statsd-jvm-profiler, see https://github.com/etsy/statsd-jvm-profiler. They even have a small bridge to push Cascading metadata to the agent, see https://github.com/etsy/statsd-jvm-profiler/blob/master/example/StatsDProfilerFlowListener.scala. 3. Dedicated Crunch API Some code needs to be executed on JVM startup / shutdown. AFAIK it is not currently possible but could be added (however I am not sure how to implement it on Spark). Unlike a javaagent, it does not require to deploy something on the nodes, metadata can be pushed to the services (ie. ctx) and it is more flexible. I believe that allowing users to easily run code at JVM startup / shutdown would be an useful improvement. Any opinion ? Clément MATHIEU