hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Hadoop support for hbase
Date Sat, 08 May 2010 17:10:00 GMT
On Sat, May 8, 2010 at 9:59 AM, Thomas Koch <thomas@koch.ro> wrote:

> I'm a little confused and concerned now that I learn that hbase uses a
> patches
> hadoop. For Debian I use plain hadoop under hbase and it seems to work in
> testing environments.

> - Are these patches necessary to run HBase?

It will work unless you have failures, in which case it will lose edits.
HBase relies on the "hflush" API (called "sync" in 0.20) which does not work
properly in 0.20 without significant patching.

Without this patch series, HBase will certainly run, but I could never
recommend running it in a production environment where data loss is a

> - Where can I find these patches?

Currently they're in various places on the JIRA - HDFS-200, HDFS-142,
HDFS-826, HDFS-561, etc. I have a github branch up which contains them all
applied, but I haven't tested it beyond unit tests - my testing is all
happening in our CDH3 tree, and afaik Dhruba's testing is on their FB
internal tree.

> - Why aren't these patches included in hadoop? Are they too unstable?

Yes, the policy is not to make such significant changes in patch releases,
so they would need to be voted into the 0.20 series. It's not that they're
entirely unstable, it's just that the code is very tricky and still under
development. The upcoming 0.21 release has a *different* implementation of
Append which also hasn't been tested significantly in real life failure
scenarios, but it's important that we keep the stable release stable.

> - If they're unstable, does this mean, HBase is unstable?
Again I would not say it's terribly unstable - but it's nowhere near the
level of stability that Hadoop is

>  Should I worry at all about these patches for the Debian packages?

If you expect that people might actually want to run a production HBase,
they should have the patches. If you expect people to just be playing around
on single node "clusters" where failures aren't an issue, best to skip. Of
course, even for production usage, I wouldn't recommend running what we've
got now - wait a month or two and it should be one more notch up the
stability/testing scale.


Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message