hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fengyun RAO <raofeng...@gmail.com>
Subject What's the best practice for managing Hadoop dependencie?
Date Mon, 10 Mar 2014 03:33:16 GMT
First of all, I want to claim that I used CDH5 beta, and managed project
using maven, and I googled and read a lot, e.g.

I believe the problem is quite common, when we write an MR job, we need
lots of dependencies,
which may not exist in or conflict with HADDOP_CLASSPATH.
There are several options, e.g.
1. add all libraries to my own JAR, and set HADOOP_USER_CLASSPATH_FIRST=true
   This is what I do, which makes the jar very big, and still it doesn't
   e.g. I already packaged guava-16.0.jar in my package, but it still use
guava-11.0.2.jar in the HADDOP_CLASSPATH.
   below is my build configuration.

2. distinguish which library is not present in HADDOP_CLASSPATH, and put it
into DistributedCache
    I think it's hard to distinguish, and still if it conflicts, which
dependency would be precedent?

*What's the best practice, especially using maven?*

View raw message