hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: [DISCUSS] HBase as Apache top-level project?
Date Fri, 19 Mar 2010 00:00:09 GMT
Isn't the hard spot where we've always been?  :)

Annoyance has really not gotten us anywhere.  And I don't think it matters to those in Hadoop
whether we are a TLP or SP, they will not (or should not) be offended if we break off.  Do
you think they would take us (or our patches) less seriously if we were a TLP?  

What has pushed things forward is continuing to make HBase better so that more people want
to use it.  A larger community and involvement from larger companies will help push Hadoop
changes aimed at HBase, especially when those companies are Hadoop contributors.


I think being a TLP is good because it gives us autonomy, more visibility, and some kind of
external validation from Apache that HBase has risen to that level (which I believe it has).
 I see the risks as not too serious.

If we do think we can get some HBase committers onto the Hadoop PMC, and we think that this
will make a material difference in outcomes for us, then my opinion may change.  Today I don't
really think the issue is whether we are on the Hadoop PMC or not... my understanding is that
big decisions are not voted on for a majority, if someone votes against it then it is tabled.

JG

> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> Sent: Thursday, March 18, 2010 4:09 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] HBase as Apache top-level project?
> 
> On Thu, Mar 18, 2010 at 1:07 PM, Jonathan Gray <jgray@facebook.com>
> wrote:
> > Will HDFS patches aimed at helping the HBase use case (which is not
> strictly limited to HBase but rather our pattern that differs from MR)
> be any less likely to get pushed through if we become a TLP rather than
> sub-project?  In reality I don't think the distinction makes a
> practical difference in that sense.
> >
> 
> If there are hbase-friendly committers up in hadoop they can marshall
> through hbase-friendly patches.  Then whether we're under hadoop or
> TLP matters less (though I do think Jay Booth has a good point when he
> suggests that the best way to make the case for the hbase hdfs access
> pattern is to '"stay, and be more annoying...")
> 
> Currently we have only one hbase committer who is also a committer in
> hadoop and the path to more than this is involved if we move out from
> under hadoop, Dhruba's point (Its just been confirmed that an hbase
> committer of a year or so vintage qualifies as a nominee to hadoop
> pmc).
> 
> 
> > The things that will really help push the HDFS+HBase relationship are
> things like committers of HDFS being users or contributors of HBase.
>  Recent interest from Facebook and Cloudera, who each have multiple
> committers to Hadoop, has really pushed things along nicely in recent
> weeks.
> >
> 
> This is true.  Its for sure made more difference than that one
> hbase-friendly committer has done during his tenure as an hadoop
> committer.
> 
> The downside though is that there is nothing to stop the above
> companies changing their minds and then a TLP hbase would be in an
> hard spot.
> 
> St.Ack
> 
> 
> > JG
> >
> >> -----Original Message-----
> >> From: Jay Booth [mailto:jaybooth@gmail.com]
> >> Sent: Thursday, March 18, 2010 12:45 PM
> >> To: hbase-dev@hadoop.apache.org
> >> Subject: Re: [DISCUSS] HBase as Apache top-level project?
> >>
> >> I'm neither an HBase user (just yet) or a contributor so my opinion
> >> isn't
> >> really worth a whole lot here..
> >>
> >> But I see HBase as being more similar to MapReduce than to ZK or
> Avro
> >> as far
> >> as becoming a top-level project.  Theoretically you can plug in
> >> alternate
> >> filesystems but in reality, both systems run on HDFS as of now and
> >> might run
> >> on other stuff in the future.  I agree that there's sometimes been a
> >> lack of
> >> urgency with regard to HDFS patches that affect HBase but not
> Mapreduce
> >> --
> >> but I think HBase leaving the project wouldn't really help, and
> could
> >> hurt
> >> both HBase and HDFS.
> >>
> >> In other words, HDFS needs a tenant like HBase to push the use cases
> >> that
> >> MapReduce doesn't cover -- if there are problems with communication
> btw
> >> subprojects or with HDFS committer priorities, we should address
> those
> >> issues rather than split HBase off and amplify the distance.  With
> >> MapReduce
> >> and HBase both stretching the capabilities, HDFS can continue to
> evolve
> >> into
> >> being a (the?) robust, performant, mature distributed filesystem.
>  If
> >> it
> >> only optimizes for one use case, then it's just a niche i/o layer
> for
> >> mapreduce.
> >>
> >> So I guess my opinion is, "stay, and be more annoying" :)  But in a
> >> good
> >> way.
> >>
> >>
> >> On Thu, Mar 18, 2010 at 3:09 PM, Jonathan Gray <jgray@facebook.com>
> >> wrote:
> >>
> >> > I would like to see HBase support alternative filesystems in the
> >> future.
> >> >  There have been talks of other up and coming DFSs that were built
> >> more for
> >> > random access that might make sense for some use cases.  I imagine
> a
> >> time
> >> > down the road where there would be a choice of DFS depending on a
> >> particular
> >> > use case.
> >> >
> >> > Users coming from the Hadoop world who would be utilizing both and
> >> likely
> >> > be more tuned towards analytics would just add HBase atop Hadoop.
> >> Someone
> >> > coming from a relational database who is interested in fast
> >> read/write
> >> > random access might be able to choose a DFS more closely suited to
> >> that use
> >> > case.  Hopefully HDFS gets better at this so it could be the
> leader
> >> across
> >> > the board, but I don't think we should necessarily be married to
> it.
> >> >  Besides possible differences in append APIs, in general, it
> should
> >> not be
> >> > difficult to plug a different DFS in (and it's been done in the
> past
> >> with
> >> > kfs).
> >> >
> >> > While it would be nice if active HBase committers were eventually
> >> made into
> >> > Hadoop PMC committers, to this point this has not happened (I
> believe
> >> stack
> >> > was already on Hadoop PMC when HBase become a sub-project).  When
> we
> >> want to
> >> > add a new committer we now have to build a case to people who
> >> actually have
> >> > no community insight rather than allowing our community (which I
> >> believe is
> >> > big enough to support itself) to make their own decisions.
> >> >
> >> > Also, I've not seen Stack's presence on the Hadoop PMC in any way
> >> > contribute to the likelihood of an HDFS patch getting committed.
> >> >
> >> > That being said, we would not want to create any bad blood w/ the
> >> Hadoop
> >> > community.  Dhruba, do you think that is a risk?
> >> >
> >> > JG
> >> >
> >> > > -----Original Message-----
> >> > > From: Dhruba Borthakur [mailto:dhruba@gmail.com]
> >> > > Sent: Thursday, March 18, 2010 11:08 AM
> >> > > To: hbase-dev@hadoop.apache.org
> >> > > Subject: Re: [DISCUSS] HBase as Apache top-level project?
> >> > >
> >> > > Hi Stack,
> >> > >
> >> > > Can HBase (in theory) be used on filesystems/MR other than
> Hadoop?
> >> > >
> >> > > I see one primary disadvantage of moving away from the Hadoop
> >> project.
> >> > > Please let me explain. In the Hadoop world, if a committer is
> >> actively
> >> > > contributing code, she/he becomes part of the Hadoop PMC. This
> >> means
> >> > > that
> >> > > Hbase active hbase committers would (over time) become Hadoop
> PMC
> >> > > members.
> >> > > This might allow Hbase-related fixes to get into HDFS much more
> >> easily.
> >> > > If
> >> > > HBase moves away from Hadoop, then Hbase developers will not
> have a
> >> > > part to
> >> > > play in guiding HDFS to make it more amenable to HBase usage.
> >> > >
> >> > > The case is different for ZK and avro. They are not related to
> >> Hadoop
> >> > > HDFS/MR at all.
> >> > >
> >> > > I am not voting against this proposal, just laying out my
> >> viewpoint.
> >> > >
> >> > > thanks,
> >> > > dhruba
> >> > >
> >> > >
> >> > > On Thu, Mar 18, 2010 at 10:43 AM, Stack <stack@duboce.net>
> wrote:
> >> > >
> >> > > > On Thu, Mar 18, 2010 at 10:15 AM, Andrew Purtell
> >> > > <apurtell@apache.org>
> >> > > > wrote:
> >> > > > >
> >> > > > > HBase is an integrated optional part of a Hadoop stack more
> >> > > > > than a standalone component, but other ASF TLPs build on
top
> >> > > > > of other projects. I suppose HDFS and ZK are going to be
> TLPs
> >> > > > > at some point also, is that true? Leaving Hadoop as just
the
> >> > > > > MR framework?
> >> > > >
> >> > > > If the board allows us be a TLP, Zookeeper would probably be
> made
> >> a
> >> > > > TLP at same time.
> >> > > >
> >> > > > There hasn't been a vote, but it seems that the thought is
> that
> >> HDFS
> >> > > > would stay within the hadoop fold; i.e. hdfs+mapreduce+common
> >> would
> >> > > > stay.
> >> > > >
> >> > > > >
> >> > > > > Anyway, what I like is HBase will stand on its own merits.
> >> > > > >
> >> > > > > What are the risks of being a TLP?
> >> > > > >
> >> > > >
> >> > > > I'm sure there are some but I'm blinded by the upside at the
> >> moment.
> >> > > >
> >> > > > St.Ack
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Connect to me at http://www.facebook.com/dhruba
> >> >
> >

Mime
View raw message