flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Usage of Hadoop 2.2.0
Date Wed, 09 Sep 2015 11:58:44 GMT
I created a Jira for this: https://issues.apache.org/jira/browse/FLINK-2643

On Fri, 4 Sep 2015 at 13:01 Matthias J. Sax <mjsax@apache.org> wrote:

> +1 for dropping
>
> On 09/04/2015 11:04 AM, Maximilian Michels wrote:
> > +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
> > release is hardly used and complicates the important high-availability
> > changes in Flink.
> >
> > On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <sewen@apache.org> wrote:
> >> I am good with that as well. Mind that we are not only dropping a binary
> >> distribution for Hadoop 2.2.0, but also the source compatibility with
> 2.2.0.
> >>
> >>
> >>
> >> Lets also reconfigure Travis to test
> >>
> >>  - Hadoop1
> >>  - Hadoop 2.3
> >>  - Hadoop 2.4
> >>  - Hadoop 2.6
> >>  - Hadoop 2.7
> >>
> >>
> >> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <chiwanpark@apache.org>
> wrote:
> >>>
> >>> +1 for dropping Hadoop 2.2.0
> >>>
> >>> Regards,
> >>> Chiwan Park
> >>>
> >>>> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <uce@apache.org> wrote:
> >>>>
> >>>> +1 to what Robert said.
> >>>>
> >>>> On Thursday, September 3, 2015, Robert Metzger <rmetzger@apache.org>
> >>>> wrote:
> >>>> I think most cloud providers moved beyond Hadoop 2.2.0.
> >>>> Google's Click-To-Deploy is on 2.4.1
> >>>> AWS EMR is on 2.6.0
> >>>>
> >>>> The situation for the distributions seems to be the following:
> >>>> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
> >>>> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
> >>>>
> >>>> HDP 2.0  (October 2013) is using 2.2.0
> >>>> HDP 2.1 (April 2014) uses 2.4.0 already
> >>>>
> >>>> So both vendors and cloud providers are multiple releases away from
> >>>> Hadoop 2.2.0.
> >>>>
> >>>> Spark does not offer a binary distribution lower than 2.3.0.
> >>>>
> >>>> In addition to that, I don't think that the HDFS client in 2.2.0 is
> >>>> really usable in production environments. Users were reporting
> >>>> ArrayIndexOutOfBounds exceptions for some jobs, I also had these
> exceptions
> >>>> sometimes.
> >>>>
> >>>> The easiest approach  to resolve this issue would be  (a) dropping the
> >>>> support for Hadoop 2.2.0
> >>>> An alternative approach (b) would be:
> >>>>  - ship a binary version for Hadoop 2.3.0
> >>>>  - make the source of Flink still compatible with 2.2.0, so that users
> >>>> can compile a Hadoop 2.2.0 version if needed.
> >>>>
> >>>> I would vote for approach (a).
> >>>>
> >>>>
> >>>> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <trohrmann@apache.org>
> >>>> wrote:
> >>>> While working on high availability (HA) for Flink's YARN execution I
> >>>> stumbled across some limitations with Hadoop 2.2.0. From version
> 2.2.0 to
> >>>> 2.3.0, Hadoop introduced new functionality which is required for an
> >>>> efficient HA implementation. Therefore, I was wondering whether there
> is
> >>>> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still
> actively used
> >>>> by someone?
> >>>>
> >>>> Cheers,
> >>>> Till
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
>
>

Mime
View raw message