hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: [VOTE] Abandon hdfsproxy HDFS contrib
Date Mon, 04 Apr 2011 19:19:27 GMT
Could those of you who -1ed the removal of HDFS Proxy please look into the
test that has been failing our Hudson build for the last several months:
https://issues.apache.org/jira/browse/HDFS-1666

<https://issues.apache.org/jira/browse/HDFS-1666>It is one thing to say that
we "should" maintain a piece of code, but it's another to actually maintain
it. In my mind, part of maintaining a project involves addressing consistent
test failures as high priority items.

-Todd

On Tue, Feb 22, 2011 at 9:27 PM, Nigel Daley <ndaley@mac.com> wrote:

> For closure, this vote fails due to a couple binding -1 votes.
>
> Nige
>
> On Feb 18, 2011, at 4:46 AM, Eric Baldeschwieler wrote:
>
> > Hi Bernd,
> >
> > Apache Hadoop is about scale. Most clusters will always be small, but
> Hadoop is going mainstream precisely because it scales to huge data and
> cluster sizes.
> >
> > There are lots of systems that work well on 10 node clusters. People
> select   Hadoop because they are confident that as their business / problem
> grows, Hadoop can grow with it.
> >
> > ---
> > E14 - via iPhone
> >
> > On Feb 17, 2011, at 7:25 AM, "Bernd Fondermann" <
> bernd.fondermann@googlemail.com> wrote:
> >
> >> On Thu, Feb 17, 2011 at 14:58, Ian Holsman <hadoop@holsman.net> wrote:
> >>> Hi Bernd.
> >>>
> >>> On Feb 17, 2011, at 7:43 AM, Bernd Fondermann wrote:
> >>>>
> >>>> We have the very unfortunate situation here at Hadoop where Apache
> >>>> Hadoop is not the primary and foremost place of Hadoop development.
> >>>> Instead, code is developed internally at Yahoo and then contributed
in
> >>>> (smaller or larger) chunks to Hadoop.
> >>>
> >>> This has been the situation in the past,
> >>> but as you can see in the last month, this has changed.
> >>>
> >>> Yahoo! has publicly committed to move their development into the main
> code base, and you can see they have started doing this with the 20.100
> branch,
> >>> and their recent commits to trunk.
> >>> Combine this with Nige taking on the 0.22 release branch, (and
> sheperding it into a stable release) and I think we have are addressing your
> concerns.
> >>>
> >>> They have also started bringing the discussions back on the list, see
> the recent discussion about Jobtracker-nextgen Arun has re-started in
> MAPREDUCE-279.
> >>>
> >>> I'm not saying it's perfect, but I think the major players understand
> there is an issue, and they are *ALL* moving in the right direction.
> >>
> >> I enthusiastically would like to see your optimism be verified.
> >> Maybe I'm misreading the statements issued publicly, but I don't think
> >> that this is fully understood. I agree though that it's a move into
> >> the right direction.
> >>
> >>>> This is open source development upside down.
> >>>> It is not ok for people to diff ASF svn against their internal code
> >>>> and provide the diff as a patch without reviewing IP first for every
> >>>> line of code changed.
> >>>> For larger chunks I'd suggest to even go via the Incubator IP
> clearance process.
> >>>> Only then will we force committers to primarily work here in the open
> >>>> and return to what I'd consider a healthy project.
> >>>>
> >>>> To be honest: Hadoop is in the process of falling apart.
> >>>> Contrib Code gets moved out of Apache instead of being maintained
> here.
> >>>> Discussions are seldom consense-driven.
> >>>> Release branches stagnate.
> >>>
> >>> True. releases do take a long time. This is mainly due to it being
> extremely hard to test and verify that a release is stable.
> >>> It's not enough to just run the thing on 4 machines, you need at least
> 50 to test some of the major problems. This requires some serious $ for
> someone to verify.
> >>
> >> It has been proposed on the list before, IIRC. Don't know how to get
> >> there, but the project seriously needs access to a cluster of this
> >> size.
> >>
> >>>> Downstream projects like HBase don't get proper support.
> >>>> Production setups are made from 3rd party distributions.
> >>>> Development is not happening here, but elsewhere behind corporate
> doors.
> >>>> Discussion about future developments are started on corporate blogs
(
> >>>>
> http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/
> >>>> ) instead of on the proper mailing list.
> >>>> Hurdles for committing are way too high.
> >>>> On the bright side, new committers and PMC members are added, this is
> >>>> an improvement.
> >>>>
> >>>> I'd suggest to move away from relying on large code dumps from
> >>>> corporations, and move back to the ASF-proven "individual committer
> >>>> commits on trunk"-model where more committers can get involved.
> >>>> If that means not to support high end cluster sizes for some months,
> >>>> well, so be it.
> >>>
> >>>> Average committers cannot run - e.g. test - on high
> >>>> end cluster sizes. If that would mean they cannot participate, then
> >>>> the open source project better concentrate on small and medium sized
> >>>> cluster instead.
> >>>
> >>>
> >>> Well.. that's one approach.. but there are several companies out there
> who rely on apache's hadoop to power their large clusters, so I'd hate to
> see hadoop become something that only runs well on
> >>> 10-nodes.. as I don't think that will help anyone either.
> >>
> >> But only looking at high-end scale doesn't help either.
> >>
> >> Lets face the fact that Hadoop is now moving from early adaptors phase
> >> into a much broader market. I predict that small to medium sized
> >> clusters will be the majority of Hadoop deployments in a few month
> >> time. 4000, or even 500 machines is the high-end range. If the open
> >> source project Hadoop cannot support those users adequately (without
> >> becoming defunct), the committership might be better off to focus on
> >> the low-end and medium sized users.
> >>
> >> I'm not suggesting to turn away from the handfull (?) of high-end
> >> users. They certainly have most valuable input. But also, *they*
> >> obviously have the resources in terms of larger clusters and
> >> developers to deal with their specific setups. Obviously, they don't
> >> need to rely on the open source project to make releases. In fact,
> >> they *do* work on their own Hadoop derivatives.
> >> All the other users, the hundreds of boring small cluster users, don't
> >> have that choice. They *depend* on the open source releases.
> >>
> >> Hadoop is an Apache project, to provide HDFS and MR free of charge to
> >> the general public. Not only to me - nor to only one or two big
> >> companies either.
> >> Focus on all the users.
> >>
> >> Bernd
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message