avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom White <...@cloudera.com>
Subject Re: [DISCUSS] Hadoop 1 support in Avro 1.8
Date Tue, 02 Sep 2014 14:55:26 GMT
Avro has a compile-time ('provided') dependency on Hadoop MR APIs in
avro-mapred and trevni-avro, both of which use the new MR APIs that
changed incompatibly between Hadoop 1 and 2. We introduced separate
profiles so we could produce separate binary artifacts for avro-mapred
and trevni-avro for Hadoop 1 and 2. Users set a classifier to select
the one they want to use, and in the absence of a classifier the
Hadoop 1 artifact is used.

Hadoop 2 has been the stable Hadoop release for a while now [1], and I
think most people are using Hadoop 2 based clusters these days so we
should at least change the default to Hadoop 2. If we only built
against Hadoop 2 then the avro-mapred and trevni-avro JARs would not
work on Hadoop 1 clusters, but I think that would be OK for Avro 1.8.
It would be good to remove classifiers since they are easily missed by
users, and don't work well with transitive dependencies [2].

The Avro tools JAR has always only included Hadoop 1 classes (actually It's a bug that you still can't use the tools JAR against
a Hadoop 2 cluster - i.e. that we don't provide a Hadoop 2 artifact
for this. AVRO-1567 is another bug. Both would be fixed by using
Hadoop 2 dependencies.


[1] http://www.us.apache.org/dist/hadoop/common/stable/
[2] https://github.com/Parquet/parquet-mr/pull/32#issuecomment-17283008

On Fri, Aug 22, 2014 at 10:17 PM, Doug Cutting <cutting@apache.org> wrote:
> I'm not proposing dropping Hadoop 1.x APIs, since most (all?) of those are
> still present in 2.x.  Rather I'm proposing we replace Hadoop 1.x
> dependencies with Hadoop 2.x, no longer building releases compiled against
> 1.x and no longer testing against 1.x.  Currently we build jars compiled
> against both 1.x and 2.x, but most testing (e.g., Jenkins) is only done
> against 1.x.
> The specific problem is that the Hadoop 1.x runtime uses a Sun-specific
> class that causes tests to fail udner IBM's JVM, while the Hadoop 2.x
> runtime does not.  I don't propose we make any code changes, rather just
> update poms to avoid this runtime problem.
> An alternative is to add profiles for different Hadoop versions to poms of
> all modules that depend on Hadoop, and to perform Jenkins testing against
> both profiles.  The former creates a lot of duplication in the poms, making
> them harder to maintain.  The latter adds maintenance costs to keep Jenkins
> running.  I'm not convinced the benefit is worth the effort.  Do we think
> folks using Hadoop 1.x will update to Avro 1.8?
> Doug
> On Fri, Aug 22, 2014 at 12:09 PM, Sean Busbey <busbey@cloudera.com> wrote:
>> AVRO-1567 is attempting to get Avro working well with the IBM JVM and some
>> of our dependency on Hadoop is causing them pain.
>> Specifically, there's some location where we rely on Hadoop 1 core for a
>> method that internally uses Sun JVM specific code. In Hadoop 2's client the
>> issue is fixed.
>> Doug mentioned the possibility that we simply drop Hadoop 1 support for 1.8
>> and rely on the presence of a fix in the Hadoop 2 version.
>> What do folks think?
>> Personally, I'm -0. As an alternative, I think we could change 1.8 to
>> default the tools artifact to Hadoop 2 without expressly dropping Hadoop 1
>> support.
>> Are there other compelling reasons to drop Hadoop 1 APIs?
>> --
>> Sean

View raw message