hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: LimitedPrivate and HBase
Date Wed, 08 Jun 2011 16:36:35 GMT
On 06/08/2011 05:13 PM, Andrew Purtell wrote:
>> I can tell you feel I'm picking on HBase, especially in light of my
>> flat out rejection of the "we want to mmap() blocks" case.
> I for one understand the objection there.
> Although it does negatively impact the work of a recent promising new contributor. As
a project, HBase suffers for it. Of course that is no concern of HDFS.
> On the other hand I do believe Todd has a point. MapReduce is perhaps the only constituency
that HDFS really cares about. Any reasonable person would come to that conclusion after surveying
submitted JIRAs and their resolution times (or not). Historically with HDFS the local itch,
the concern of the big MapReduce shops, gets the scratch and others are of not much concern.
Therefore there is unfortunate business that lingers today -- Facebook, StumbleUpon, Trend
Micro, and others have effectively forked HDFS (0.20) in house for use with HBase, and nobody
I know is seriously considering using HDFS 0.22 or TRUNK due to a lack of evidence that anyone
with a stake in it is running it in production at scale. Past discussion to mend the breach
with an HBase-friendly release of HDFS 0.20 ended with what I would describe as an inflexible
and legalistic air.

well, today MR is the primary constituency, but to be a stack you do 
have to make the otyher layers work. MR, with Hive and Pig on top, 
HBase, mahout.

These extra layers can form part of the regression tests for the 
underlying code: if a change breaks HBase or Hive, that's something to 
catch early, and say "this change to hadoop-common broke it".

yes, it's extra hassle dealing with changes that break things, but you 
find the problems so end users don't have to. And Jenkins can be set up 
to do much of the work, you just tweak the dependencies of the 
downstream projects to use the svn.trunk or -SNAPSHOT version of your 
code, run the builds in the right order to generate the artifacts, and 
wait for the emails to come in.


View raw message