hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hammerbacher <ham...@cloudera.com>
Subject Re: [VOTE] Direction for Hadoop development
Date Tue, 07 Dec 2010 10:23:39 GMT
> A critical part of Hadoop's usability comes from its framework combined
> with library code that allows users to get the desired functionality without
> writing it themselves.

> The goal is to make Hadoop useful out of the box.

To the best of my knowledge, Owen, your organization requires users to
petition a committee before writing MapReduce jobs. At Facebook, the vast
majority of jobs are submitted via Hive. Our customers at Cloudera primarily
consume MapReduce through Pig, Hive, and other high-level tools.

Users of Hadoop have moved beyond MapReduce. The community would be far
better served by a compact, reliable, and efficient kernel. That's the
project direction Doug has suggested for MapReduce, and it's one that Eric
and Tom have supported. I also support this direction for the project.

We're clearly having a hard time, as a community, agreeing on standards for
library code. We've also shipped updates to the framework without updating
the library code, seriously damaging the usability of the project. In this
discussion, we're prioritizing the rapidly shrinking proportion of users of
MapReduce library code in favor of the far larger community of consumers of
the framework.

Arun recently asked on Quora about issues that users face with Hadoop
MapReduce: http://qr.ae/pPNK. There are currently five issues brought up
there, with 19 votes for those issues; none of them are addressed directly
by this extended debate.

I'd be ecstatic to see this discussion result in moving the file formats,
input and output formats, and other library code out to a separate Apache
project or Github where they can evolve rapidly based on user needs, so that
the MapReduce project can begin to address some of the outstanding issues
with the framework itself.

HDFS, HBase, Hive, Pig, Oozie, and other Hadoop-related projects continue to
make forward progress at a remarkable rate; I'd like to see MapReduce return
to health as well. Clearing away these major sources of conflict seems like
one promising path forward.

So, I'm not on the PMC, but I'm -1 on the proposed vote.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message