Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 53088 invoked from network); 18 Mar 2010 23:59:13 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Mar 2010 23:59:13 -0000 Received: (qmail 51794 invoked by uid 500); 18 Mar 2010 23:59:13 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 51766 invoked by uid 500); 18 Mar 2010 23:59:13 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 51758 invoked by uid 99); 18 Mar 2010 23:59:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Mar 2010 23:59:13 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jgray@facebook.com designates 69.63.179.25 as permitted sender) Received: from [69.63.179.25] (HELO mailout-sf2p.facebook.com) (69.63.179.25) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Mar 2010 23:59:06 +0000 Received: from mail.thefacebook.com ([192.168.18.82]) by pp02.snc1.tfbnw.net (8.14.3/8.14.3) with ESMTP id o2INwCqY011831 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for ; Thu, 18 Mar 2010 16:58:12 -0700 Received: from SC-MBXC1.TheFacebook.com ([192.168.18.102]) by sc-hub05.TheFacebook.com ([192.168.18.82]) with mapi; Thu, 18 Mar 2010 16:58:44 -0700 From: Jonathan Gray To: "hbase-dev@hadoop.apache.org" Date: Thu, 18 Mar 2010 17:00:09 -0700 Subject: RE: [DISCUSS] HBase as Apache top-level project? Thread-Topic: [DISCUSS] HBase as Apache top-level project? Thread-Index: AcrG8BxNHXAuAgWcQkOED741NHZTPgAAlh9w Message-ID: <8D66B74984F9564BBB25C3C67D630F2D661ABD14@SC-MBXC1.TheFacebook.com> References: <7c962aed1003180958m73cf4b6ao898a549650a93464@mail.gmail.com> <888097.21542.qm@web65501.mail.ac4.yahoo.com> <7c962aed1003181043y142ef18bj78e8fcacd9848b32@mail.gmail.com> <4aa34eb71003181108q72fe4b2dj8a50465bf952e8a5@mail.gmail.com> <8D66B74984F9564BBB25C3C67D630F2D6606512B@SC-MBXC1.TheFacebook.com> <54dc3c51003181245r4c56b0eaq6704f0eb3063e465@mail.gmail.com> <8D66B74984F9564BBB25C3C67D630F2D66065170@SC-MBXC1.TheFacebook.com> <7c962aed1003181609g1dfa95d9nd2c454e1e5483a78@mail.gmail.com> In-Reply-To: <7c962aed1003181609g1dfa95d9nd2c454e1e5483a78@mail.gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166 definitions=2010-03-18_16:2010-02-06,2010-03-18,2010-03-18 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org Isn't the hard spot where we've always been? :) Annoyance has really not gotten us anywhere. And I don't think it matters = to those in Hadoop whether we are a TLP or SP, they will not (or should not= ) be offended if we break off. Do you think they would take us (or our pat= ches) less seriously if we were a TLP? =20 What has pushed things forward is continuing to make HBase better so that m= ore people want to use it. A larger community and involvement from larger = companies will help push Hadoop changes aimed at HBase, especially when tho= se companies are Hadoop contributors. I think being a TLP is good because it gives us autonomy, more visibility, = and some kind of external validation from Apache that HBase has risen to th= at level (which I believe it has). I see the risks as not too serious. If we do think we can get some HBase committers onto the Hadoop PMC, and we= think that this will make a material difference in outcomes for us, then m= y opinion may change. Today I don't really think the issue is whether we a= re on the Hadoop PMC or not... my understanding is that big decisions are n= ot voted on for a majority, if someone votes against it then it is tabled. JG > -----Original Message----- > From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of > Stack > Sent: Thursday, March 18, 2010 4:09 PM > To: hbase-dev@hadoop.apache.org > Subject: Re: [DISCUSS] HBase as Apache top-level project? >=20 > On Thu, Mar 18, 2010 at 1:07 PM, Jonathan Gray > wrote: > > Will HDFS patches aimed at helping the HBase use case (which is not > strictly limited to HBase but rather our pattern that differs from MR) > be any less likely to get pushed through if we become a TLP rather than > sub-project? =A0In reality I don't think the distinction makes a > practical difference in that sense. > > >=20 > If there are hbase-friendly committers up in hadoop they can marshall > through hbase-friendly patches. Then whether we're under hadoop or > TLP matters less (though I do think Jay Booth has a good point when he > suggests that the best way to make the case for the hbase hdfs access > pattern is to '"stay, and be more annoying...") >=20 > Currently we have only one hbase committer who is also a committer in > hadoop and the path to more than this is involved if we move out from > under hadoop, Dhruba's point (Its just been confirmed that an hbase > committer of a year or so vintage qualifies as a nominee to hadoop > pmc). >=20 >=20 > > The things that will really help push the HDFS+HBase relationship are > things like committers of HDFS being users or contributors of HBase. > =A0Recent interest from Facebook and Cloudera, who each have multiple > committers to Hadoop, has really pushed things along nicely in recent > weeks. > > >=20 > This is true. Its for sure made more difference than that one > hbase-friendly committer has done during his tenure as an hadoop > committer. >=20 > The downside though is that there is nothing to stop the above > companies changing their minds and then a TLP hbase would be in an > hard spot. >=20 > St.Ack >=20 >=20 > > JG > > > >> -----Original Message----- > >> From: Jay Booth [mailto:jaybooth@gmail.com] > >> Sent: Thursday, March 18, 2010 12:45 PM > >> To: hbase-dev@hadoop.apache.org > >> Subject: Re: [DISCUSS] HBase as Apache top-level project? > >> > >> I'm neither an HBase user (just yet) or a contributor so my opinion > >> isn't > >> really worth a whole lot here.. > >> > >> But I see HBase as being more similar to MapReduce than to ZK or > Avro > >> as far > >> as becoming a top-level project. =A0Theoretically you can plug in > >> alternate > >> filesystems but in reality, both systems run on HDFS as of now and > >> might run > >> on other stuff in the future. =A0I agree that there's sometimes been a > >> lack of > >> urgency with regard to HDFS patches that affect HBase but not > Mapreduce > >> -- > >> but I think HBase leaving the project wouldn't really help, and > could > >> hurt > >> both HBase and HDFS. > >> > >> In other words, HDFS needs a tenant like HBase to push the use cases > >> that > >> MapReduce doesn't cover -- if there are problems with communication > btw > >> subprojects or with HDFS committer priorities, we should address > those > >> issues rather than split HBase off and amplify the distance. =A0With > >> MapReduce > >> and HBase both stretching the capabilities, HDFS can continue to > evolve > >> into > >> being a (the?) robust, performant, mature distributed filesystem. > =A0If > >> it > >> only optimizes for one use case, then it's just a niche i/o layer > for > >> mapreduce. > >> > >> So I guess my opinion is, "stay, and be more annoying" :) =A0But in a > >> good > >> way. > >> > >> > >> On Thu, Mar 18, 2010 at 3:09 PM, Jonathan Gray > >> wrote: > >> > >> > I would like to see HBase support alternative filesystems in the > >> future. > >> > =A0There have been talks of other up and coming DFSs that were built > >> more for > >> > random access that might make sense for some use cases. =A0I imagine > a > >> time > >> > down the road where there would be a choice of DFS depending on a > >> particular > >> > use case. > >> > > >> > Users coming from the Hadoop world who would be utilizing both and > >> likely > >> > be more tuned towards analytics would just add HBase atop Hadoop. > >> Someone > >> > coming from a relational database who is interested in fast > >> read/write > >> > random access might be able to choose a DFS more closely suited to > >> that use > >> > case. =A0Hopefully HDFS gets better at this so it could be the > leader > >> across > >> > the board, but I don't think we should necessarily be married to > it. > >> > =A0Besides possible differences in append APIs, in general, it > should > >> not be > >> > difficult to plug a different DFS in (and it's been done in the > past > >> with > >> > kfs). > >> > > >> > While it would be nice if active HBase committers were eventually > >> made into > >> > Hadoop PMC committers, to this point this has not happened (I > believe > >> stack > >> > was already on Hadoop PMC when HBase become a sub-project). =A0When > we > >> want to > >> > add a new committer we now have to build a case to people who > >> actually have > >> > no community insight rather than allowing our community (which I > >> believe is > >> > big enough to support itself) to make their own decisions. > >> > > >> > Also, I've not seen Stack's presence on the Hadoop PMC in any way > >> > contribute to the likelihood of an HDFS patch getting committed. > >> > > >> > That being said, we would not want to create any bad blood w/ the > >> Hadoop > >> > community. =A0Dhruba, do you think that is a risk? > >> > > >> > JG > >> > > >> > > -----Original Message----- > >> > > From: Dhruba Borthakur [mailto:dhruba@gmail.com] > >> > > Sent: Thursday, March 18, 2010 11:08 AM > >> > > To: hbase-dev@hadoop.apache.org > >> > > Subject: Re: [DISCUSS] HBase as Apache top-level project? > >> > > > >> > > Hi Stack, > >> > > > >> > > Can HBase (in theory) be used on filesystems/MR other than > Hadoop? > >> > > > >> > > I see one primary disadvantage of moving away from the Hadoop > >> project. > >> > > Please let me explain. In the Hadoop world, if a committer is > >> actively > >> > > contributing code, she/he becomes part of the Hadoop PMC. This > >> means > >> > > that > >> > > Hbase active hbase committers would (over time) become Hadoop > PMC > >> > > members. > >> > > This might allow Hbase-related fixes to get into HDFS much more > >> easily. > >> > > If > >> > > HBase moves away from Hadoop, then Hbase developers will not > have a > >> > > part to > >> > > play in guiding HDFS to make it more amenable to HBase usage. > >> > > > >> > > The case is different for ZK and avro. They are not related to > >> Hadoop > >> > > HDFS/MR at all. > >> > > > >> > > I am not voting against this proposal, just laying out my > >> viewpoint. > >> > > > >> > > thanks, > >> > > dhruba > >> > > > >> > > > >> > > On Thu, Mar 18, 2010 at 10:43 AM, Stack > wrote: > >> > > > >> > > > On Thu, Mar 18, 2010 at 10:15 AM, Andrew Purtell > >> > > > >> > > > wrote: > >> > > > > > >> > > > > HBase is an integrated optional part of a Hadoop stack more > >> > > > > than a standalone component, but other ASF TLPs build on top > >> > > > > of other projects. I suppose HDFS and ZK are going to be > TLPs > >> > > > > at some point also, is that true? Leaving Hadoop as just the > >> > > > > MR framework? > >> > > > > >> > > > If the board allows us be a TLP, Zookeeper would probably be > made > >> a > >> > > > TLP at same time. > >> > > > > >> > > > There hasn't been a vote, but it seems that the thought is > that > >> HDFS > >> > > > would stay within the hadoop fold; i.e. hdfs+mapreduce+common > >> would > >> > > > stay. > >> > > > > >> > > > > > >> > > > > Anyway, what I like is HBase will stand on its own merits. > >> > > > > > >> > > > > What are the risks of being a TLP? > >> > > > > > >> > > > > >> > > > I'm sure there are some but I'm blinded by the upside at the > >> moment. > >> > > > > >> > > > St.Ack > >> > > > > >> > > > >> > > > >> > > > >> > > -- > >> > > Connect to me at http://www.facebook.com/dhruba > >> > > >