hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mirko Kämpf <mirko.kae...@gmail.com>
Subject Re: Execute hadoop job remotely and programmatically
Date Tue, 10 Dec 2013 11:07:19 GMT
Hi Yexi,

please have a look at the -libjars option of the hadoop cmd. It tells the
system what additional libs have to be sent to the cluster before the job
can start. Each time you submit the job, this kind of distribution happens
again. So its not a good idea for really large libs, those you should
deploy on all nodes and than you have to configure the classpath of the
JVMs running the tasks.

Best wishes.


2013/12/9 Yexi Jiang <yexijiang@gmail.com>

> Hi, All,
> I am working on a project that requires to execute a hadoop job remotely
> and the job requires some third-part libraries (jar files).
> Based on my understanding, I tried:
> 1. Copy these jar files to hdfs.
> 2. Copy them into the distributed cache using
> DistributedCache.addFileToClassPath so that hadoop can spread these jar
> files to each of the slave nodes.
> However, my program still throws ClassNotFoundException. Indicating that
> some of the classes cannot be found when the job is running.
> So I'm wondering:
> 1. What is the correct way to run a job remotely and programmatically
> while the job requires some third-party jar files.
> 2. I found DistributedCache is deprecated (I'm using hadoop 1.2.0), what
> is the alternative class?
> Regards,
> Yexi

View raw message