hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (POWERSET)" <Jim.Keller...@microsoft.com>
Subject RE: You guys rocked the house this week!
Date Mon, 15 Jun 2009 18:15:41 GMT
I'm going to try to answer all the questions around Cloudera, etc in
this email.

> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Sunday, June 14, 2009 11:18 AM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: You guys rocked the house this week!
>
> Hi Jim,
>
> > BTW: Cloudera's next release is going to be based on 0.20, and
> > they will either include HBase as alpha software, or put us
> > in their supported stack, depending on the reaction from our
> > community.
>
> What does that mean, "depending on the reaction from our  community"?

If the community tries it out and start saying:
- it's not as fast as we claim
- it's failover does not work as advertised
- it's not as solid as advertised
- etc.

i.e., we receive a lot of negative press, we end up in a 2nd class bin.
Otherwise, they will start devoting a resource and we will end up as
a top tier app.

> > If we do this, Cloudera has volunteered to run that
> > script on EC2 on a ~100 node cluster to burn it in. (They have
> > some arrangement with Amazon) and they have volunteered to run
> > the test on a "big" cluster for us.
>
> I think HBase, as well as Hadoop frankly, can also use a reasonably
> scaled performance, reliability, and fault tolerance automated test
> platform. (See "Re: scanner is returning everything in parent region
> plus one of the daughters?") Think of it as expanding Hudson to
> a cluster of several nodes hosted with community resources, perhaps
> on EC2, running some suite once per day, or perhaps triggered by a
> project once they reach a certain milestone, so each project could
> be allocated a budget in terms of hours/month and time limits of
> hours/day or similar. ~10 nodes seems reasonably affordable, with
> ~100 used on occasion, the difference being daily versus weekly, or
> weekly versus monthly.
>
> Stepping back from blue sky, I wonder if HBase anyway can pool
> resources to run such a reasonably scaled performance, reliability,
> and fault tolerance automated test at least twice a week. 10 extra
> large EC2 instances running 5 hours per day is about $300/month.

That was one of the issues discussed on Friday. Nigel will be working
with Tom White on getting this set up but it won't be there for a while.

> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of stack
> Sent: Sunday, June 14, 2009 2:06 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: You guys rocked the house this week!
>
> Sounds like an interesting meeting.  Thanks for representing.
>
> Being part of the Cloudera bundle would be a nice-to-have but sounds
> like they are still on the fence and meantime they want us to do some
> scripting?

All they want is a simple script that launches Performance Evaluation
and randomly kills Master, Region servers and datanodes. They will handle
the startup and config of the EC2 cluster.

I viewed this as a positive step. We get our stuff run at scale, and they
are willing to devote some resources to it.

> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Sunday, June 14, 2009 6:28 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: You guys rocked the house this week!
>
> > Andrew Purtell volunteered long ago to aid with packaging
>
> Yes.
>
> > and has been doing ongoing work to make hbase TRUNK works on hadoop
> > 0.18.3.
>
> Yes. More work today, in fact.
>

> > I know there was some difficulty communicating at first, has this
> > been worked out since?
>
> I believe so.
>
> I'll produce RPMs and DEBs for 0.20 release for both generic (top level
> spec file in HBase distrib for Hadoop 0.20) and also Cloudera specific
> packaging (0.18.3 branch).
>
> Beyond this, I haven't heard anything specifically requested. I would
> expect some integration with their config wizard would be needed to
> become part of the official release. I have offered to support that, as
> well as make time quarterly for release engineering.

I don't know what their status is around releases is, except their
next release will be based on Hadoop 0.20. If we can get the packaging,
etc., for 0.18.3 then when they go to 0.20, there should be almost
nothing extra to do for 0.20, so I don't think doing the support for
0.18.3 is a waste of time.

I'm sorry if I confused anyone. I wanted to convey that Cloudera was
very high on HBase after last week and I think we should be seeing
devoting some resources on HBase in the near future.

-Jim

Mime
View raw message