hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject RE: HDFS-1599 status? (HDFS tickets to improve HBase)
Date Sun, 05 Jun 2011 23:07:09 GMT

Re:  "*and* there are some people who would be willing to set it up on some small dev clusters
and run load tests, I'll move forward with it."

Count us in.

-----Original Message-----
From: Todd Lipcon [mailto:todd@cloudera.com] 
Sent: Sunday, June 05, 2011 6:41 PM
To: dev@hbase.apache.org; apurtell@apache.org
Subject: Re: HDFS-1599 status? (HDFS tickets to improve HBase)

On Sat, Jun 4, 2011 at 1:46 AM, Andrew Purtell <apurtell@apache.org> wrote:

> This is not discouraging. :-)
> HBasers patch CDH because trunk -- anything > 0.20 actually -- is not 
> trusted by consensus if you look at all of the production deployments. 
> Does ANYONE run trunk under anything approaching "production"? And 
> trunk/upstream has a history of ignoring any HBase specific concern. 
> So the use of and trading of patches will probably continue for a while, maybe forever.

Right - I wasn't suggesting that you run trunk in production as of yet. But there has been
very little activity in terms of HBase people running trunk in dev/test clusters in the past.
Stack has done some awesome work here in the last few weeks, so that should open it up for
some more people to jump on board.

I agree that HBase has been treated as a second-class citizen in recent years from HDFS's
performance, but I think that has changed. All of the major HDFS contributors now have serious
stakes in HBase, and so long as there are tests with sufficient testing that apply against
trunk, I don't see a reason they wouldn't be included.

> Part of the problem is the expectation that any patch provided against 
> trunk may generate months of back and forth, as we have seen, which 
> presents difficulities to a potential contributor who does not work on 
> e.g. HDFS matters full time. Alternatively it may pick up a committer 
> as sponsor and then be vetoed by Yahoo because they're mad at Cloudera 
> over some unrelated issue and a patch appears to have a Cloudera sponsor and/or or vice
> Now, that situation I describe _is_ discouraging. It's not enough to 
> say that we must contribute through trunk. Trunk needs to earn back our trust.

Yes, there have been some unfortunate things in the past. There have also been some half-finished
or untested patches proposed, and you can't blame HDFS folks for not taking a big patch that
doesn't have a lot of confidence behind it.

I've been thinking about this this afternoon, and have an idea. It may prove to be an awful
one, but maybe it's a good one, only time will tell :) I'll create a branch off of HDFS trunk
specifically for HBase performance work.
We can commit these "90% done" patches there, which will make it easier for others to test
and gain confidence. Branches also can make it easier to maintain patches over time with a
changing trunk.

How does this sound to the HBase community? If it seems like a good idea,
*and* there are some people who would be willing to set it up on some small dev clusters and
run load tests, I'll move forward with it.

> I believe I recently saw discussion that append should be removed or 
> disabled by default on 0.22 or trunk. Did you see anything like this? 
> If I am mistaken, fine. If not, this is going in the wrong direction, 
> for example.

Not sure what you're referring to - I don't remember any discussion like this.

Todd Lipcon
Software Engineer, Cloudera

View raw message