Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44476FC59 for ; Sun, 21 Apr 2013 13:29:06 +0000 (UTC) Received: (qmail 47099 invoked by uid 500); 21 Apr 2013 13:29:01 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 46707 invoked by uid 500); 21 Apr 2013 13:29:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 46698 invoked by uid 99); 21 Apr 2013 13:29:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Apr 2013 13:29:01 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [195.29.150.135] (HELO ls405.t-com.hr) (195.29.150.135) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Apr 2013 13:28:53 +0000 Received: from ls265.t-com.hr (ls265.t-com.hr [195.29.150.93]) by ls405.t-com.hr (Postfix) with ESMTP id 482B06C0148 for ; Sun, 21 Apr 2013 15:28:33 +0200 (CEST) Received: from ls265.t-com.hr (localhost.localdomain [127.0.0.1]) by ls265.t-com.hr (Qmlai) with ESMTP id 3BDA52110267 for ; Sun, 21 Apr 2013 15:28:33 +0200 (CEST) X-Envelope-Sender: vjeran.marcinko@email.t-com.hr Received: from ButterflyBoy (89-172-197-11.adsl.net.t-com.hr [89.172.197.11]) by ls265.t-com.hr (Qmali) with ESMTP id EFA3020B024B for ; Sun, 21 Apr 2013 15:28:32 +0200 (CEST) From: "Vjeran Marcinko" To: References: <000001ce3e5f$aea2c080$0be84180$@email.t-com.hr> In-Reply-To: Subject: RE: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome) Date: Sun, 21 Apr 2013 15:28:32 +0200 Message-ID: <000701ce3e94$1e637700$5b2a6500$@email.t-com.hr> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQIqjKtzD4/L6Si4YHrt9y7V5b5T5QDyBYslmCB7OkA= Content-Language: hr X-TM-AS-Product-Ver: IMSS-7.1.0.1224-7.0.0.1014-19814.007 X-TM-AS-Result: No--21.248-10.0-31-1 X-imss-scan-details: No--21.248-10.0-31-1 X-TM-AS-User-Approved-Sender: No X-Virus-Checked: Checked by ClamAV on apache.org Yes, it's exactly the things I wanted. One more thing though - although most of Hadoop MR job examples use "hadoop jar" command for starting job-submitting apps, I somehow don't like that "shell"-way, because this way job driver apps can only be submitted on machines where hadoop is installed, and I would much more like it to be from my code (ie. programmatically), so I can execute this job submission from anywhere (such as having a complete java product somewhere that can submit jobs on user web request). Also, that way I can submit jobs directly from my IDE, which is always the best developing environment - especially compared to this alternative -> having some build scripts that will package the app, deploy remotely on hadoop machine and execute "hadoop jar" command there just to see if its working (during development). But, most of examples found on the web give overly simple case when programmatically submitting WordCount example, that doesn't rely on any 3rd party lib. From what I read around, it seems that DistributedCache mechanism has to be used for that, so I'm asking if anyone have some good complete example for submitting jobs programmatically with 3rd party jars included? Moreover, this confusion with multiple MR APIs don't help either. I found some example from "Hadoop in Practice" book, which uses its own JobHelper util class to add jars to job config, but it seems it places jar paths into some "tmpjars" or something like that... In other words, I would like to do programmatically the same stuff that Tool apps have when using -libjars option with "hadoop jar" command. Cheers, Vjeran -----Original Message----- From: Harsh J [mailto:harsh@cloudera.com] Sent: Sunday, April 21, 2013 2:09 PM To: Subject: Re: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome) The MR project supports jars which have a subdirectory lib/ inside it, carrying all required dependencies. Would that not solve your need? You don't need to re-pack things, just pack them with the lib/ created inside with necessary dependencies during the build itself. On Sun, Apr 21, 2013 at 12:43 PM, Vjeran Marcinko wrote: > Hi, > > > > Can somebody tell me if there's some easy way to specify 3rd party > libs for my MR driver application without having to: > > > > 1. Create fat jar by unpackaging all dep libs and packing them again > (which really takes some time for couple of dozen dep libs wit my > gradle fatjar plugin task) > > 2. Specify libs individually inside "-libjars" option for Tool - but > that's cumbersome since one has to specify each of them individually > and that means building this string somehow > > > > Isn't there some way to specify just some directory, say "libs" on > your local drive, and place lib jars there, and driver configuration > to pick them up? Or just to pack all jars into one jar, but unlike fat > jar which requires unpacking every lib and packing them again, just to > nest these jars inside this new archive? > > > > Regards, > > Vjeran > > > > -- Harsh J