flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Usage of Hadoop 2.2.0
Date Thu, 03 Sep 2015 16:15:30 GMT
I think most cloud providers moved beyond Hadoop 2.2.0.
Google's Click-To-Deploy is on 2.4.1
AWS EMR is on 2.6.0

The situation for the distributions seems to be the following:
MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)

HDP 2.0  (October 2013) is using 2.2.0
HDP 2.1 (April 2014) uses 2.4.0 already

So both vendors and cloud providers are multiple releases away from Hadoop
2.2.0.

Spark does not offer a binary distribution lower than 2.3.0.

In addition to that, I don't think that the HDFS client in 2.2.0 is really
usable in production environments. Users were reporting
ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
sometimes.

The easiest approach  to resolve this issue would be  (a) dropping the
support for Hadoop 2.2.0
An alternative approach (b) would be:
 - ship a binary version for Hadoop 2.3.0
 - make the source of Flink still compatible with 2.2.0, so that users can
compile a Hadoop 2.2.0 version if needed.

I would vote for approach (a).


On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <trohrmann@apache.org> wrote:

> While working on high availability (HA) for Flink's YARN execution I
> stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
> 2.3.0, Hadoop introduced new functionality which is required for an
> efficient HA implementation. Therefore, I was wondering whether there is
> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively
> used by someone?
>
> Cheers,
> Till
>

Mime
View raw message