hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Milind.Bhandar...@emc.com>
Subject Re: Which proposed distro of Hadoop, 0.20.206 or 0.22, will be better for HBase?
Date Wed, 05 Oct 2011 23:55:02 GMT

I understand your use case, I think, and so here are my thoughts, inline:

>1. Hbase support, i.e. working scale tested Append and Hflush in HDFS

Absolutely. Hbase (and other components of the stack that do not follow
the MapReduce paradigm) are increasingly important. It is important to
realize that as Hadoop gains popularity, people will look at consolidating
their workloads, and are going to need at least the baseline features such
as append and flush to achieve that.

>2. Built in support for the cloud. (Whirr is interesting. Ambari more so,
>but both fall short.)

Not very sure. If by "support for the cloud" means ability to provision
atop a hypervisor, adding or removing instances etc, I think there are
other approaches proven in the industry.

>3. Assumption that 10GBE is around the corner (really, this time), and
>storage locality is irrelevant

Yes, I have been shouting over the rooftops about this for quite some time

>4. Storage efficiency is important. Alternatives to a 3 replica HDFS, such
>as erasure code, should be first class citizens in this distro.

Absolutely. Usable space is much more important than raw space.

>5. H/A for the NN

Yes, it's a must. Some proprietary file systems that provide
o.a.h.f.FileSystem API have this feature already, and getting a lot of
positive press recently.

>Such a distro would be an outstanding thing for the Hadoop community. I
>think 0.20.20x is the closest to this, but I am not sure.

Other than the merge of 0.20-append patches into 0.20.205, I am not aware
of any other changes that address any of your requirements 1-5.

>My hope is that this discussion will get some input from users of Hadoop.
>may be wrong, as this may be the wrong forum for this discussion. (The
>thing I really accomplished was to evoke a hurried and semi-infuriated
>Sunday afternoon private email response from some key players in the

Yeah, some key players in hadoop community are infuriated on Sunday
afternoons, based on my informal sentiment analysis of twitter streams. ;-)

>My ultimate goal is to influence the product managers at Hadoop startups
>established companies to assign high priorities to these items.

Believe me, I know some product managers at Hadoop startups and
established companies, who have a slide highlighting most of the above

>In short, I don't own the whip, the buggy, or the horse ... but I am
>to crack the whip. :-)

Ha Ha ! Interesting analogy. But this is open-source world. Here no one
"owns" (or at least, not supposed to own) the whip, buggy, or horse. So,
you are not alone :-)

>Milind - I do look forward to your input as to the importance of these
>features, and whether these are feasible in one of the source branches in
>the near future.

Indeed, these are feasible. Indeed these are important, and indeed they
will be in one of the source branches in future. I donĀ¹t know about *near*
future, though.

- Milind

Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and
do not necessarily represent the views of any organization, past or
present, the author might be affiliated with.)

View raw message