Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B503F104A4 for ; Tue, 10 Dec 2013 11:08:15 +0000 (UTC) Received: (qmail 55251 invoked by uid 500); 10 Dec 2013 11:08:09 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 55161 invoked by uid 500); 10 Dec 2013 11:08:09 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 55154 invoked by uid 99); 10 Dec 2013 11:08:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Dec 2013 11:08:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mirko.kaempf@gmail.com designates 209.85.160.49 as permitted sender) Received: from [209.85.160.49] (HELO mail-pb0-f49.google.com) (209.85.160.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Dec 2013 11:08:01 +0000 Received: by mail-pb0-f49.google.com with SMTP id jt11so7393846pbb.22 for ; Tue, 10 Dec 2013 03:07:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=qNtb2FxIu5RamAzfr3nkmVDM/QeJMg3V0LTjbDsYHJM=; b=l5GN505RZWWlFvy/xWY6597bXyp538l5wBphmmGzlPFJ+UEbX307W+QNNflI0lOJOG VVeGXZ5O+9WSaZRx7VcP1ceLKW4UCQsywLbXqCdR9Sx2j6nBNdYSabN8eqBL7unM7zuG tkqi5vlwTtcC2RvHqj0DR81q5ttZFVZhB17kNTg1m1LWM4OeuJlT0aFb7lBdbPyff2do HJGIIo/8YpYzsYXZGsMskufc48EdpwMX757XYZscBbj3hbml8wbru+Wb3aemfI948adP FmTw1uwhXImkd7+V0bq9y7WVWBcTDbpa7w3JtEYEC67QXwV2/zqh2K2gqAuC1upXSQCh RRhA== X-Received: by 10.68.224.38 with SMTP id qz6mr10617508pbc.156.1386673659542; Tue, 10 Dec 2013 03:07:39 -0800 (PST) MIME-Version: 1.0 Received: by 10.66.197.195 with HTTP; Tue, 10 Dec 2013 03:07:19 -0800 (PST) In-Reply-To: References: From: =?ISO-8859-1?Q?Mirko_K=E4mpf?= Date: Tue, 10 Dec 2013 12:07:19 +0100 Message-ID: Subject: Re: Execute hadoop job remotely and programmatically To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b160513e585b304ed2c1e34 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b160513e585b304ed2c1e34 Content-Type: text/plain; charset=ISO-8859-1 Hi Yexi, please have a look at the -libjars option of the hadoop cmd. It tells the system what additional libs have to be sent to the cluster before the job can start. Each time you submit the job, this kind of distribution happens again. So its not a good idea for really large libs, those you should deploy on all nodes and than you have to configure the classpath of the JVMs running the tasks. Best wishes. Mirko Best 2013/12/9 Yexi Jiang > Hi, All, > > I am working on a project that requires to execute a hadoop job remotely > and the job requires some third-part libraries (jar files). > > Based on my understanding, I tried: > > 1. Copy these jar files to hdfs. > 2. Copy them into the distributed cache using > DistributedCache.addFileToClassPath so that hadoop can spread these jar > files to each of the slave nodes. > > However, my program still throws ClassNotFoundException. Indicating that > some of the classes cannot be found when the job is running. > > So I'm wondering: > 1. What is the correct way to run a job remotely and programmatically > while the job requires some third-party jar files. > 2. I found DistributedCache is deprecated (I'm using hadoop 1.2.0), what > is the alternative class? > > Regards, > Yexi > --047d7b160513e585b304ed2c1e34 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Yexi,

please have a look at th= e -libjars option of the hadoop cmd. It tells the system what additional li= bs have to be sent to the cluster before the job can start. Each time you s= ubmit the job, this kind of distribution happens again. So its not a good i= dea for really large libs, those you should deploy on all nodes and than yo= u have to configure the classpath of the JVMs running the tasks.

Best wishes.
Mirko


Best=A0


2013/12/9 Yexi Jian= g <yexijiang@gmail.com>
Hi, All,

I am working on a project that requires to execute a ha= doop job remotely and the job requires some third-part libraries (jar files= ).

Based on my understanding, I tried:

1. Copy these jar files to hdfs.
2. Copy them into= the distributed cache using DistributedCache.addFileToClassPath so that ha= doop can spread these jar files to each of the slave nodes.=A0

However, my program still throws ClassNotFoundException. Ind= icating that some of the classes cannot be found when the job is running.

So I'm wondering:
1. What is the corr= ect way to run a job remotely and programmatically while the job requires s= ome third-party jar files.
2. I found DistributedCache is deprecated (I'm using hadoop 1.2.0)= , what is the alternative class?

Regards,
Yexi

--047d7b160513e585b304ed2c1e34--