hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <esam...@cloudera.com>
Subject Re: [VOTE] Direction for Hadoop development
Date Tue, 14 Dec 2010 07:14:50 GMT
On Tue, Dec 14, 2010 at 12:43 AM, Owen O'Malley <omalley@apache.org> wrote:

>
> On Dec 13, 2010, at 8:49 PM, Eric Sammer wrote:
>
>  One of the technical issues is the fact that this precludes users from
>> using
>> PB (or thrift or avro) in their jobs if the version required conflicts
>> with
>> what Hadoop proper has on the classpath.
>>
>
> This is currently true of all of our libraries and is addressed by
> MAPREDUCE-1938. After that is committed, users who want to override to a
> newer version just need to configure their job to do so.


That's definitely nice. I hadn't seen that one.

Using a new library, just because it is a serialization library other than
> Avro, is not an acceptable reason to veto a patch.
>

That I'm not really qualified to say. I don't really know the rules on
vetoes. But again, I'm more interested in the larger issue you raised (the
subject of the thread). Part of direction for Hadoop, to me, is to get to a
point where we're spending time working together. Again, I propose:

- Codify (by vote) whether design plans are required or if an informal email
indicating intent is sufficient, and under what circumstances. Provide
examples to clarify circumstances. Solves the long term but not HADOOP-6685.
- Focus the discussion on evaluation of proposals for remedying the process
for conflict resolution. I know some exist, but they're drastic (removal of
PMC members, for instance).
- After consensus on above, focus the conversation (in another thread or on
JIRA, whatever is most appropriate) on HADOOP-6685 so no one is
blocked.
- Put the community of users first in all areas of development and
interaction.

To the last point:

I understand there's contention from past issues. I genuinely believe
everyone has the users' interests at heart. I'm saying this as a user: this
kind of contention is not in anyone's interest. We need true resolution to
past issues, consensus on what the goals are and generally how to get there
including how to resolve further disagreement, and only then can we jump
back into the immediate issue where there is disagreement. I no longer care
how corny I sound about this (and it's about to get corny). I implore all
parties involved to take a long look at how we interact and to approach this
with renewed respect for each other, the project, and the users. Decide to
let previous cruft go and start anew. Do that by building consensus on
getting out of a veto stalemate and coming up with a long term plan that
makes sense to everyone.

To the specific issue:

Owen, would you be amenable to working to find a way to remove the PB dep in
support of HADOOP-6685 and handling bootstrapping with either one of the
existing deps or simple hard coded length, type, value serialization /
deserialization similar to Writables? I understand your points about PB
being solid, but Hadoop is already thick with deps (some of which do handle
this, even if not in the preferred / most optimal format) and MR-1938 is
still a ways off.

Doug, is there any way to get past the objection to the SequenceFile update?
It is a widely used format and is currently in Hadoop core. While I agree
Hadoop should be a "kernel" as one artifact and libs as another, I think it
would be less friction and cleaner to come up with a plan on how to get to
that state independent of pending issues right now. It seems like
maintaining backwards compat is critical to Owen et al as well and I'm sure
we can come up with modifications to the patch to make it forward compat as
well (if it's not already; I'm unclear on / don't remember this point).

This, to me, looks like an achievable goal that doesn't compromise the
functionality of HADOOP-6685 and leaves the door open to discussion of a
stronger kernel / lib separation.

Regards, respect, and no longer afraid of being corny on general@,
-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message