accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <>
Subject Hadoop 2 compatibility issues
Date Tue, 14 May 2013 20:40:23 GMT
So, I've run into a problem with ACCUMULO-1402 that requires a larger
discussion about how Accumulo 1.5.0 should support Hadoop2.

The problem is basically that profiles should not contain
dependencies, because profiles don't get activated transitively. A
slide deck by the Maven developers point this out as a bad practice...
yet it's a practice we rely on for our current implementation of
Hadoop2 support
slide 80).

What this means is that even if we go through the work of publishing
binary artifacts compiled against Hadoop2, neither our Hadoop1
binaries or our Hadoop2 binaries will be able to transitively resolve
any dependencies defined in profiles. This has significant
implications to user code that depends on Accumulo Maven artifacts.
Every user will essentially have to explicitly add Hadoop dependencies
for every Accumulo artifact that has dependencies on Hadoop, either
because we directly or transitively depend on Hadoop (they'll have to
peek into the profiles in our POMs and copy/paste the profile into
their project). This becomes more complicated when we consider how
users will try to use things like Instamo.

There are workarounds, but none of them are really pleasant.

1. The best way to support both major Hadoop APIs is to have separate
modules with separate dependencies directly in the POM. This is a fair
amount of work, and in my opinion, would be too disruptive for 1.5.0.
This solution also gets us separate binaries for separate supported
versions, which is useful.

2. A second option, and the preferred one I think for 1.5.0, is to put
a Hadoop2 patch in the branch's contrib directory
(branches/1.5/contrib) that patches the POM files to support building
against Hadoop2. (Acknowledgement to Keith for suggesting this

3. A third option is to fork Accumulo, and maintain two separate
builds (a more traditional technique). This adds merging nightmare for
features/patches, but gets around some reflection hacks that we may
have been motivated to do in the past. I'm not a fan of this option,
particularly because I don't want to replicate the fork nightmare that
has been the history of early Hadoop itself.

4. The last option is to do nothing and to continue to build with the
separate profiles as we are, and make users discover and specify
transitive dependencies entirely on their own. I think this is the
worst option, as it essentially amounts to "ignore the problem".

At the very least, it does not seem reasonable to complete
ACCUMULO-1402 for 1.5.0, given the complexity of this issue.

Thoughts? Discussion? Vote on option?

Christopher L Tubbs II

View raw message