hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Date Wed, 29 Aug 2012 23:44:23 GMT
On Wed, Aug 29, 2012 at 4:29 PM, Mattmann, Chris A (388J)
<chris.a.mattmann@jpl.nasa.gov> wrote:

> You're right, it's not project boundaries, it's poor community behavior,
> and general umbrella-project-ness.

No doubt there's bad behavior. But splitting into smaller projects
won't help anything. We'll still have the exact same behavior inside
the smaller projects.

>
> One aspect I've seen is that exclusivity of allowing people to become
> PMC members on the project, and the separation of PMC from C.
> Other things I've seen are the use of technical justifications or complexity
> issues as an excuse for the exclusivity, as an excuse for drawing boundaries
> between project committers and PMC members, and then between specific
> products that the project and community as a whole releases, and finally
> other things I've seen include external interests influencing the way that
> business is done around here (need for releases in downstream companies,
> or projects driving upstream, Apache decisions, which are supposed to be
> independent of any lone company, or set of companies -- it's individuals here
> that do the work).
>

It's individuals that do the work, but the individuals get paid by
companies, so individuals acting in their best interests are going to
tend to align with their company. They also often know details about
their customer bases that they can't share directly, which can be
frustrating, but it's a fact of life. I'm sure we'd see the same if we
were 20 independent consultants each with our own priorities, etc.

> The above is not a discrete thing that's happened once, or twice, or that
> happened three times, but was fixed later. It's never been fixed.
>

IMO it's massively improved since a couple years ago. We're making
good progress on the 2.0 line, we no longer have divergent forks, and
I haven't seen an issue get vetoed in recent memory. Please provide
some recent examples where you think that splitting into smaller
granularity projects would help anything.

>>
>> Instead, the issues are usually _within_ a component. So, if we split
>> into 3 TLPs, then we'll just have 3 TLPs, each of which is just as
>> contentious as before.
>
> I doubt that. Creating TLPs either directly by going to the board, or
> via going to the Incubator should involve a set of members of the
> committee (PMC) that desire to work together; that ideally trust one another; that
> are inclusive to others who jump on the list and discuss things; and that
> collect these principles into the "Apache way", and build and deliver software at
> no cost to the public via this Foundation.

Just because we argue doesn't mean we don't desire to work together.
Smart passionate people will argue. I argue with my colleagues here at
Cloudera, I argue with Hortonworkers, and I argue with Facebookers -
it doesn't really matter much. I still enjoy getting beers with them
when I end up at conferences. No hard feelings, we're all adults,
right?

>
> Currently, the Apache Hadoop project isn't doing that. Something needs
> to be done to fix it. Just because an attempt to split the projects in the past
> didn't work doesn't mean that the Hadoop community should just accept
> "this is a popular project; it's going to be contentious; nothing to see here
> folks".

Again, please provide examples. From my vantage point, I see a lot of
progress being made on critical features: we've done federation, HA
namenode, massive performance improvements, YARN, practically
rewritten NameNode, and more in the last couple years. Hardly an
unproductive community.

>
> It's more than that.
>
>>
>> Let's just embrace contention as a fact of life on a high-profile
>> high-stakes project and get back to work.
>
> -1 to that. Apache projects shouldn't be contentious, whether you are a billion dollar
> industry like Hadoop, or whether you are the US govt, or whether you are Joe Blow,
> Mom and Pop, building software to deliver to food truck vendors. It doesn't matter.
> Period.

I guess we'll have to agree to disagree.

>>
>> I wasted nearly a month undoing the mess of the last attempt, and I
>> don't see why this time it would go any better. -1 from my perspective
>> on splitting again at this point. Perhaps if we get to the point that
>> we're never making cross-project commits it makes sense, but we're not
>> there still.
>
> Again, technical issues cited for community problems. *there are not technical issues*.

...says the guy who isn't on the hook to stitch it all back together
into a deliverable for demanding customers, maintain green Jenkins
builds, etc. You can say these aren't technical issues, but if you're
not dealing with the project on a technical basis, I don't think
you're well qualified to judge. I certainly appreciate the work you've
done way back in the Nutch days and your continued evangelism, but
this whole thread just seems like it's stirring up trouble and not
going to accomplish anything except a bunch of wasted man-hours. (I've
already wasted about 45 minutes today on it, oops!)

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message