accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Drob <>
Subject Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Date Tue, 15 Oct 2013 02:24:00 GMT
Responses Inline.

- Mike

On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey <> wrote:

> Hey All,
> I'd like to restart the conversation from end July / start August about
> Hadoop 2 support on the 1.4 branch.
> Specifically, I'd like to get some requirements ironed out so I can file
> one or more jiras. I'd also like to get a plan for application.
> =requirements
> Here's the requirements I have from the last thread:
> 1)  Maintain existing 1.4 compatibility
> The only thing I see listed in the pom is Apache release (1.4.4
> tag)[1]
> I don't see anything in the README[2] nor the user manual[3] on other
> versions being supported.
> Yep.

> 2) Gain Hadoop 2 support
> At the moment, I'm presuming this means Apache release 2.0.4-alpha since
> that's what 1.5.0 builds against for Hadoop 2.
> I haven't been following the Hadoop 2 release schedule that closely, but I
think the latest is a 2.1.0-beta? Pretty sure it was released after we
finished Accumulo 1.5, so there's no reason not to support it in my mind.
Depending on an "alpha" of something strikes me as either unstable or lazy,
although I fully understand that it may be neither.

> 3) Test for correctness on given versions, with >= 5 node cluster
> * Unit Tests
> * Functional Tests
> * 24hr continuous + verification
> * 24hr continuous + verification + agitation
> * 24hr random walk
> * 24hr random walk + agitation
> Keith mentioned running these against a CDH4 cluster, but I presume that
> since Apache Releases are our stated compatibilities it would actually be
> against whatever versions we list. Based on #1 and #2 above, I would expect
> that to be Apache Hadoop and Apache Hadoop 2.0.4-alpha.
> Hadoop 2 introduces some neat new things like NN HA, which I think it
might be worthwhile to test with. At that level it might be more of a
verification of the Hadoop code, but I'd like to be comfortable that our
DFS Clients switch correctly. This is in addition to the standard release
suite that we run. [1]


> 4) Binary packaging
> 4a) Either source produces a single binary for all accepted versions
> or
> 4b) Instructions for building from source for each versions and somehow
> flag what (if any) convenience binaries are made for the release.
Having run the binary packaging for 1.4.4, I can tell you that it is not in
great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so
I didn't bother spending a ton of time on them here, but I think RPM and
DEB are both broken. It would be nice to be able to specify a Hadoop 2
version for compilation, similar to what happens in the newer code base,
which could be back ported, I suppose. 4b seems easier.

> There will be many back-ported patches. Not much active development happens
> on 1.4.x now, but I presume this should still all go onto a feature branch?
> Is the community preference that eventually all the changes become a single
> commit (or one-per-subtask if there are multiple jiras) on the active 1.4
> development branch, or that the original patches remain broken out?
> Not sure what you mean by this.

> For what it's worth, I'd recommend keeping them broken out. (And that's how
> the initial development against CDH4 has been done.)
> [1]
> [2]
> [3]
> --
> Sean

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message