Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 64048EC9B for ; Thu, 17 Jan 2013 19:32:39 +0000 (UTC) Received: (qmail 47079 invoked by uid 500); 17 Jan 2013 19:32:38 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 46994 invoked by uid 500); 17 Jan 2013 19:32:37 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 46984 invoked by uid 99); 17 Jan 2013 19:32:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2013 19:32:37 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stan.rosenberg@gmail.com designates 209.85.216.49 as permitted sender) Received: from [209.85.216.49] (HELO mail-qa0-f49.google.com) (209.85.216.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2013 19:32:31 +0000 Received: by mail-qa0-f49.google.com with SMTP id r4so2417868qaq.15 for ; Thu, 17 Jan 2013 11:32:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:reply-to:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=VwV3J5PZmdBoFYckOjRHtg7dRCO0CE8fAfWk8+AzfeU=; b=KPXH+Ru+IpVfHqTi8WfIbXnN/6/SngXjIXlLn1YN52sdEKY1YBE3QOuEIdNe7HD9c/ l5aksttUQPCLh/U6HUhzKMQ50X0t57Wuc863we5iKKQjD6iVVsRiM6FxXKxcTZf9BsL8 D6ScUH1L99mmcDlxNTinb4bT0DUoQufOYEu89uefPbUpUPdcrSpynZPgh/ettlNQ7rZy L+1bsgVT5YIXUnLZEbONFLWCdjhSBNHL4eOHTcGODzTbPICgskD1jNuqu3E27fM74EJn ee3jfod+Vt6eWMjGBy6VaralmAJxcBJgbo2ukPiROci+eXjl8Kq16fcrpH3oPGoUPhw6 t5TQ== MIME-Version: 1.0 X-Received: by 10.49.2.35 with SMTP id 3mr7461043qer.36.1358451131315; Thu, 17 Jan 2013 11:32:11 -0800 (PST) Received: by 10.229.22.69 with HTTP; Thu, 17 Jan 2013 11:32:11 -0800 (PST) Reply-To: stan.rosenberg@gmail.com In-Reply-To: References: <63DFEAC0-93EA-4ED9-9EE3-CCB791D56A0D@hortonworks.com> Date: Thu, 17 Jan 2013 14:32:11 -0500 Message-ID: Subject: Re: task jvm bootstrapping via distributed cache From: Stan Rosenberg To: Harsh J Cc: mapreduce-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi, I am back with my original problem. I am trying to bootstrap child JVM via -javaagent. I am doing what Harsh and Arun suggested, which also agrees with the documentation. In theory this should work, but it doesn't. Any ideas before I start digging into the code? Thanks. Here is the command I am using to test: hadoop jar /usr/lib/hadoop/hadoop-examples-0.20.2-cdh3u3.jar wordcount -files "core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar" -Dmapred.map.child.java.opts="-javaagent:./foo.jar=classes=.*" test1 output I can see the following (relevant) properties set in job.xml, mapred.cache.files=/user/srosenberg/.staging/job_201211061805_50132/files/core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar mapred.create.symlink=yes mapred.map.child.java.opts=-javaagent:./foo.jar=classes=.* The map tasks fail with the following stdout/stderr output, resp., Error occurred during initialization of VM agent library failed to init: instrument Error opening zip file or JAR manifest missing : ./foo.jar This seems like the jar is not symlinked into the current working directory of the child JVM; or perhaps the symlinking happens after the child JVM starts? On Fri, Aug 3, 2012 at 1:31 PM, Harsh J wrote: > Stan, > > What Arun says would surely work. > > For instance, read this command: > > hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0.jar pi > -files > "share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.0.jar#foo.jar" > -Dmapred.child.java.opts="-javaagent:./foo.jar" 1 1 > > What this would do is merely take your passed -files jar (client-common) and > symlink it into the JVM's working directory (the task's working directory) > _before_ the JVM is begun, as "foo.jar". So if I pass additionally, JVM opts > that refer to this foo.jar under ./, then it would work as you expect it to, > as the JVM is begun from that directory (its CWD). > > Do let us know if this solves it and also makes sense? > > > On Fri, Aug 3, 2012 at 10:02 PM, Stan Rosenberg > wrote: >> >> Arun, >> >> I don't believe the symlink is of help. The symlink is created in the >> task's current working directory (cwd), but I don't know what cwd is >> when I launch with 'hadoop jar ...'. >> >> Thanks, >> >> stan >> >> On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy wrote: >> > Stan, >> > >> > You can ask TT to create a symlink to your jar shipped via DistCache: >> > >> > >> > http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache >> > >> > That should give you what you want. >> > >> > hth, >> > Arun >> > >> > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: >> > >> > Hi, >> > >> > I am seeking a way to leverage hadoop's distributed cache in order to >> > ship jars that are required to bootstrap a task's jvm, i.e., before a >> > map/reduce task is launched. >> > As a concrete example, let's say that I need to launch with >> > '-javaagent:/path/profiler.jar'. In theory, the task tracker is >> > responsible for downloading cached files onto its local filesystem. >> > However, the absolute path to a given cached file is not known a >> > priori; however, we need the path in order to configure '-javaagent'. >> > >> > Is this currently possible with the distributed cache? If not, is the >> > use case appealing enough to open a jira ticket? >> > >> > Thanks, >> > >> > stan >> > >> > >> > -- >> > Arun C. Murthy >> > Hortonworks Inc. >> > http://hortonworks.com/ >> > >> > > > > > > -- > Harsh J