hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Date Wed, 29 Aug 2012 23:54:17 GMT
OK I lied and said I wouldn't reply :)

On Aug 29, 2012, at 4:44 PM, Todd Lipcon wrote:

> On Wed, Aug 29, 2012 at 4:29 PM, Mattmann, Chris A (388J)
> <chris.a.mattmann@jpl.nasa.gov> wrote:
>> You're right, it's not project boundaries, it's poor community behavior,
>> and general umbrella-project-ness.
> No doubt there's bad behavior. But splitting into smaller projects
> won't help anything. We'll still have the exact same behavior inside
> the smaller projects.
>> [..snip...]
>> The above is not a discrete thing that's happened once, or twice, or that
>> happened three times, but was fixed later. It's never been fixed.
> IMO it's massively improved since a couple years ago. We're making
> good progress on the 2.0 line, we no longer have divergent forks, and
> I haven't seen an issue get vetoed in recent memory. Please provide
> some recent examples where you think that splitting into smaller
> granularity projects would help anything.

Please provide examples that show umbrella projects work. I've been
at this Foundation a lot longer than you have. I've seen them not work
and have been involved in ones that don't work. See splits from Lucene,
the same threads (with different names, different products, different software
but the exact same issues). See your own splits from Hadoop cited elsethread.
See the friggin' Apache board minutes discussing why umbrella projects 
are bad. 

I don't know what else to tell you. I'm not going to go look up all the threads.
I'm not Google nor do I care to. All I can say is that I've seen it before and
so have others. In your own project.

>>> Instead, the issues are usually _within_ a component. So, if we split
>>> into 3 TLPs, then we'll just have 3 TLPs, each of which is just as
>>> contentious as before.
>> I doubt that. Creating TLPs either directly by going to the board, or
>> via going to the Incubator should involve a set of members of the
>> committee (PMC) that desire to work together; that ideally trust one another; that
>> are inclusive to others who jump on the list and discuss things; and that
>> collect these principles into the "Apache way", and build and deliver software at
>> no cost to the public via this Foundation.
> Just because we argue doesn't mean we don't desire to work together.
> Smart passionate people will argue. I argue with my colleagues here at
> Cloudera, I argue with Hortonworkers, and I argue with Facebookers -
> it doesn't really matter much. I still enjoy getting beers with them
> when I end up at conferences. No hard feelings, we're all adults,
> right?

You still point to arguing to contention -- it's more than that Todd. The project's
policies for inclusivity have nothing to do with arguing about technical issues.

>> Currently, the Apache Hadoop project isn't doing that. Something needs
>> to be done to fix it. Just because an attempt to split the projects in the past
>> didn't work doesn't mean that the Hadoop community should just accept
>> "this is a popular project; it's going to be contentious; nothing to see here
>> folks".
> Again, please provide examples. From my vantage point, I see a lot of
> progress being made on critical features: we've done federation, HA
> namenode, massive performance improvements, YARN, practically
> rewritten NameNode, and more in the last couple years. Hardly an
> unproductive community.

Technical issues, again. 

> [..snip..]
>>> I wasted nearly a month undoing the mess of the last attempt, and I
>>> don't see why this time it would go any better. -1 from my perspective
>>> on splitting again at this point. Perhaps if we get to the point that
>>> we're never making cross-project commits it makes sense, but we're not
>>> there still.
>> Again, technical issues cited for community problems. *there are not technical issues*.
> ...says the guy who isn't on the hook to stitch it all back together
> into a deliverable for demanding customers, maintain green Jenkins
> builds, etc.

Dude, you have to do that regardless, that has nothing to do with *Apache Hadoop*.
Take your Cloudera hat off and put your *Apache Software Foundation* hat on. Is your
#1 priority developing software here to stitch code back together, turn it into a deliverable
for your customers (I'm guessing Cloudera customers, right? B/c Apache doesn't have
specific customers?) and to maintain green Jenkins builds?

Also tell me how the 4 SVN commands I suggested will stop you from doing the above?
At Apache? 

At Cloudera, tell me also how it will stop you?

> You can say these aren't technical issues, but if you're
> not dealing with the project on a technical basis, I don't think
> you're well qualified to judge.

I think you can quote me several times in this same thread and else-thread saying
I'm not technically astute with Hadoop anymore :) Admitted. 

However, I *am* astute with the aspects of this Software Foundation.

> I certainly appreciate the work you've
> done way back in the Nutch days and your continued evangelism, but
> this whole thread just seems like it's stirring up trouble and not
> going to accomplish anything except a bunch of wasted man-hours. (I've
> already wasted about 45 minutes today on it, oops!)

You had fun during those 45 mins don't lie :)

P.S. I appreciate you and am still one of your biggest fans. Just trying to 
help you see the bigger picture here and to wear your Apache hat.


Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

View raw message