hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBase-based MR-job deployment question
Date Mon, 26 Sep 2011 16:50:45 GMT
We use a mix of librairies dumped on the Hadoop classpath and
including the jars within the job's jar.

BTW the blog does mention when using option 3 to:

"Restart the TastTrackers when you are done. Do not forget to update
the jar when the underlying software changes."

Getting the classpath config right can be a pita, it helps show it
when you start a task (if you use HBase, the CP will be printed when
it starts its ZK client).


On Sun, Sep 25, 2011 at 10:43 PM, Steinmaurer Thomas
<Thomas.Steinmaurer@scch.at> wrote:
> Hello,
> regarding MR-job deployment, I read this Cloudera blog article:
> http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
> In my case, I have to deploy the Oracle JDBC driver. I've tried the various option discussed
in the article and the only one which worked out-of-the box was including the JDBC jar file
into my JAR file in the lib folder. Copying the JDBC jar into HADOOP_HOME/lib etc ... didn't
work. Whenever the MR-Job wasn't able to locate the JDBC driver, I get the infamous exception:
> java.io.IOException
>        at org.apache.hadoop.mapreduce.lib.db.DBOutputFormat.getRecordWriter(DBOutputFormat.java:180)
>        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>        at org.apache.hadoop.mapred.Child.main(Child.java:264)
> While I can embed the JDBC library with each build of our MR-job, I rather would like
to deploy the JDBC library into HADOOP_HOME/lib, because it is rather static and other MR-jobs
might depend on that as well. The interesting thing is, when working with the Cloudera VMWare,
a reboot after copying the library into HADOOP_HOME/lib helped. So, how are you deploying
your MR-jobs into a real/live cluster without the need to restart something?
> Thanks a lot!
> Thomas
> _______________________________________________________
> DI Thomas Steinmaurer
> Industrial Researcher
> Software Competence Center Hagenberg GmbH
> Softwarepark 21, A-4232 Hagenberg, Austria
> UID: ATU 48056909 - FN: 184145b, Landesgericht Linz
> Tel. +43 7236 3343-896
> Fax +43 7236 3343-888
> mailto:thomas.steinmaurer@scch.at
> http://www.scch.at/

View raw message