Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D409EA2F for ; Fri, 22 Feb 2013 21:15:23 +0000 (UTC) Received: (qmail 93889 invoked by uid 500); 22 Feb 2013 21:15:22 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 93842 invoked by uid 500); 22 Feb 2013 21:15:22 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 93833 invoked by uid 99); 22 Feb 2013 21:15:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 21:15:22 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.213.53 as permitted sender) Received: from [209.85.213.53] (HELO mail-yh0-f53.google.com) (209.85.213.53) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 21:15:18 +0000 Received: by mail-yh0-f53.google.com with SMTP id q3so202115yhf.26 for ; Fri, 22 Feb 2013 13:14:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=HHBQlUYpUUXxh33YFZCTnLkfyluNw+0yvFtjsbzisZM=; b=P2gGh0Q49q/al1uHjMsK9a0TFVcrxFBWKYlRMBInsgwc52rjZLoDCZW6XLg34Olp7y GdCPtkQTbW8E21E7imtPEJS2ZEL7cKd1bx4UJC1aJpOaHPI0QatBvCBhagXe8PddpQBa N9IA8/CRIY24kiKwBFcVP+a9AEp+n5nuBNtztHMlmBMWrjomX/OeBCBix12ir1/D4GAL smvl8wQSqKeST8tRT79voZ63qt5rZsxqU/CQOAcvho1akMGuePsUCIxz3lsHze5TET6t i83TkzHxd0V8gH5015cBoBEq2XBxHDZrvChjX7Z+hfpUJnSCYk5KUznTolWh0h9WaZaw eEdw== MIME-Version: 1.0 X-Received: by 10.236.155.7 with SMTP id i7mr6607220yhk.16.1361567697199; Fri, 22 Feb 2013 13:14:57 -0800 (PST) Received: by 10.100.81.18 with HTTP; Fri, 22 Feb 2013 13:14:56 -0800 (PST) In-Reply-To: References: Date: Fri, 22 Feb 2013 13:14:56 -0800 Message-ID: Subject: Re: Review request for HBASE-7692: Ordered byte[] serialization From: Ted Yu To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=20cf303dd2feedfdd204d656ae66 X-Virus-Checked: Checked by ClamAV on apache.org --20cf303dd2feedfdd204d656ae66 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Nick for carrying this through. My pledge to reviewers: if you disagree with putting orderly in its own module, please express your idea now. On Fri, Feb 22, 2013 at 11:37 AM, Nick Dimiduk wrote: > I'm working through the code that will produce a patch placing orderly in > its own module. A question to reviewers: would you prefer I create separa= te > JIRA/tasks for each of the individual patches? Will that be easier to > review than dumping my squashed patch onto this ticket and asking you to > look at github? Having this broken out into multiple tickets, I would fee= l > better about using review board to aggregate comments. > > Please advise. > Nick > > On Fri, Feb 22, 2013 at 10:48 AM, Nick Dimiduk wrote= : > > > On Fri, Feb 22, 2013 at 10:14 AM, Matt Corgan > wrote: > > > >> > > >> > Not quite true. It makes use of Bytes and ImmutableBytesWritable fro= m > >> > hbase-common. > >> > >> Oh, interesting. Could we inline the code from Bytes.java and somehow > get > >> rid of the ImmutableBytesWritable. Like calling packages can add > >> ImmutableBytesWritable functionality on top if they want to? > > > > > > I'll need to do a more thorough evaluation, but a cursory glance > indicates > > use of Bytes could be replaced by arraycopy. ImmutableBytesWritable is > used > > mostly as a convenient wrapper over byte[], and may well > > be replaceable with Hadoop's BytesWritable. > > > > Seems like something as low level as rearranging bytes should be > >> dependency free. > >> > > > > The implementation makes heavy use of Hadoop Writables, but the > > dependencies on HBase instances are mostly convenience. > > > > On Fri, Feb 22, 2013 at 10:04 AM, Nick Dimiduk > >> wrote: > >> > >> > Inline. > >> > > >> > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan > >> wrote: > >> > > >> > > To nitpick a little it wouldn't quite be a sibling of hbase-client > >> > because > >> > > hbase-client depends on hbase-common and hbase-protocol while this > new > >> > one > >> > > will not depend on anything. Would hbase-server be able to see it= ? > >> > Would > >> > > it basically be a standalone module being maintained by HBase? > >> > > > >> > > >> > Not quite true. It makes use of Bytes and ImmutableBytesWritable fro= m > >> > hbase-common. > >> > > >> > Also, assuming the original Orderly library goes unmaintained and we > >> want > >> > > people to use it, this will be the primary place to get it. Havin= g > no > >> > > dependencies on other hbase modules is important for people who wa= nt > >> to > >> > use > >> > > the Orderly library for something unrelated to hbase. For example= , > a > >> web > >> > > application that logs data in this format but not directly to hbas= e. > >> > > > >> > > >> > Orderly has gone unmaintained. The only fork with any activity that > I'm > >> > aware of is my own. I'd much rather see it gain the publicity, > >> > additional scrutiny, wider adoption than continue as a pet-project. > >> > > >> > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark > >> wrote: > >> > > > >> > > > Yep the client will be fully separated as soon as rpc changes > >> > > > are stabilized. Until then keeping up the move patch was just t= oo > >> > > onerous. > >> > > > > >> > > > > >> > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh > > >> > > wrote: > >> > > > > >> > > > > Nick, > >> > > > > > >> > > > > I'm +1 for it having its own module, and being a sibling of > >> > > hbase-client. > >> > > > > I'm assuming the client stuff will happen before we release > 0.96 > >> > since > >> > > > it > >> > > > > has been started. > >> > > > > > >> > > > > Jon. > >> > > > > > >> > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk < > ndimiduk@gmail.com > >> > > >> > > > wrote: > >> > > > > > >> > > > > > You're absolutely correct: this library introduces client-si= de > >> > > > > conventions > >> > > > > > and is not needed from within the HMaster or RegionServer. I= s > >> > > > > > the consensus that it should reside in it's own module or be= a > >> > > sibling > >> > > > to > >> > > > > > the o.a.h.hbase.client source tree? I'm a little confused by > the > >> > > > current > >> > > > > > state of the modules; hbase-client looks empty while > >> > > o.a.h.hbase.client > >> > > > > > sits under hbase-server. > >> > > > > > > >> > > > > > Thanks, > >> > > > > > Nick > >> > > > > > > >> > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh < > >> jon@cloudera.com > >> > > > >> > > > > wrote: > >> > > > > > > >> > > > > > > So I buy the argument about this being included in hbase, > but > >> > > several > >> > > > > of > >> > > > > > > the questions still stand -- > >> > > > > > > > >> > > > > > > Why is this part of hbase-common? shouldn't this be just = a > >> > > > dependency > >> > > > > of > >> > > > > > > hbase-client module? Does the hbase-server side need to > >> depend > >> > on > >> > > > > this? > >> > > > > > > > >> > > > > > > Since this is a large import of a currently isolated > library, > >> why > >> > > not > >> > > > > > make > >> > > > > > > it a separate module instead of part of hbase-common? Thi= s > >> would > >> > > > > > enforce a > >> > > > > > > boundary that will prevent pollution from circular > >> dependencies. > >> > > > > > > > >> > > > > > > Jon. > >> > > > > > > > >> > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis S=F6ztutar < > >> enis@apache.org> > >> > > > > wrote: > >> > > > > > > > >> > > > > > > > I think this belongs in core HBase, as a replacement to > >> Bytes, > >> > > > which > >> > > > > > > should > >> > > > > > > > be deprecated eventually. We have a Bytes utility which = is > >> > > supposed > >> > > > > to > >> > > > > > > > convert basic java types to byte[]'s, but it does not wo= rk > >> for > >> > > > signed > >> > > > > > > > numbers. > >> > > > > > > > > >> > > > > > > > We already know that all of the clients, Hive, Pig, > Phoenix, > >> > have > >> > > > to > >> > > > > > have > >> > > > > > > > at least java type -> byte[] conversion utilities, and I > >> think > >> > it > >> > > > is > >> > > > > > > > HBase's job to supply one so that different clients can > >> > > > interoperate. > >> > > > > > > Since > >> > > > > > > > internally we are also relying on serializing java types= , > we > >> > need > >> > > > > that > >> > > > > > > > library in the core. > >> > > > > > > > > >> > > > > > > > BTW, I also think that we need to have a SQL-type to jav= a > >> type > >> > to > >> > > > > > byte[] > >> > > > > > > > layer, but that is another discussion. > >> > > > > > > > > >> > > > > > > > Enis > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh < > >> > > jon@cloudera.com> > >> > > > > > > wrote: > >> > > > > > > > > >> > > > > > > > > Nick, > >> > > > > > > > > > >> > > > > > > > > While I believe having an order-preserving canonical > >> > > > serialization > >> > > > > > is a > >> > > > > > > > > good idea, from doing a read of the mail and a skim o= f > >> the > >> > > jira > >> > > > it > >> > > > > > is > >> > > > > > > > not > >> > > > > > > > > clear to my why this is inside hbase as part of > >> hbase-common. > >> > > > > > > > > > >> > > > > > > > > Why isn't this part of a library on top of hbase (a > >> > dependency > >> > > > for > >> > > > > > > > > Pig/Hive) instead of "inside" hbase? > >> > > > > > > > > Can't this functionality be done just from the client > >> level? > >> > > > > > > > > What's the end goal hee? Is the goal here to replace t= he > >> > > > > > > Bytes.toBytes(*) > >> > > > > > > > > methods to enforced the ordering? > >> > > > > > > > > If I HBase has two mutually incompatible encodings > >> > "built-in", > >> > > > how > >> > > > > > > does a > >> > > > > > > > > dev know to use one or the other later on? > >> > > > > > > > > If this is essentially a mega import of a library > (300k.. > >> > > yikes) > >> > > > , > >> > > > > > why > >> > > > > > > > not > >> > > > > > > > > make it a separate module instead of part of common? > >> > > > > > > > > > >> > > > > > > > > Jon. > >> > > > > > > > > > >> > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk < > >> > > > ndimiduk@gmail.com > >> > > > > > > >> > > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > > Hi everyone, > >> > > > > > > > > > > >> > > > > > > > > > I'm of the opinion that HBase should provide a > mechanism > >> > for > >> > > > > > > > serializing > >> > > > > > > > > > common java types such that the serialized format > sorts > >> > > > according > >> > > > > > the > >> > > > > > > > > > the natural ordering of the type. I think many > >> application > >> > > > > efforts > >> > > > > > > end > >> > > > > > > > up > >> > > > > > > > > > building a custom, partial implementation of this ki= nd > >> of > >> > > > > > > functionality > >> > > > > > > > > on > >> > > > > > > > > > their own. I think HBase should provide a canonical > >> > > > > implementation > >> > > > > > of > >> > > > > > > > > such > >> > > > > > > > > > a serialization format so that third-parties can > >> reliably > >> > > build > >> > > > > on > >> > > > > > > top > >> > > > > > > > of > >> > > > > > > > > > HBase. Not just user applications, but other tools > like > >> Pig > >> > > and > >> > > > > > Hive > >> > > > > > > > are > >> > > > > > > > > > also enabled. Implementations for > >> > > > > > > > > > HIVE-3634< > >> https://issues.apache.org/jira/browse/HIVE-3634 > >> > >, > >> > > > > > > > > > HIVE-2599 < > >> https://issues.apache.org/jira/browse/HIVE-2599 > >> > >, > >> > > > or > >> > > > > > > > > > HIVE-2903< > >> https://issues.apache.org/jira/browse/HIVE-2903 > >> > > > >could > >> > > > > be > >> > > > > > > > > > compatible with similar features in Pig. > >> > > > > > > > > > > >> > > > > > > > > > After implementing something similar on multiple > >> occasions, > >> > > > > > stumbled > >> > > > > > > > > across > >> > > > > > > > > > the Orderly > >> library. > >> > > > It's > >> > > > > > also > >> > > > > > > > > > appears to have been adopted by other large projects= , > >> > > including > >> > > > > > > > > > Lily. > >> > > > > > > > > > I've engaged the library's author for some > improvements > >> > only > >> > > to > >> > > > > > find > >> > > > > > > > out > >> > > > > > > > > > he's now at Google and will no longer be maintaining > it. > >> > > Thus, > >> > > > I > >> > > > > > > > propose > >> > > > > > > > > we > >> > > > > > > > > > take it into HBase. > >> > > > > > > > > > > >> > > > > > > > > > HBASE-7692 < > >> > https://issues.apache.org/jira/browse/HBASE-7692 > >> > > > > >> > > > > > > > includes a > >> > > > > > > > > > patch that introduces Orderly into hbase-common unde= r > >> the > >> > > > orderly > >> > > > > > > > > > namespace. I have an associated branch on > >> > > > > > > > > > gihub< > >> > > > > > > > > > >> > > > > > >> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization > >> > > > > > > > > > >wherein > >> > > > > > > > > > I've broken the patch out into multiple commits to > ease > >> > > review. > >> > > > > > > > > > Please take a few minutes to give it a look. > >> > > > > > > > > > > >> > > > > > > > > > Thanks, > >> > > > > > > > > > Nick > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > -- > >> > > > > > > > > // Jonathan Hsieh (shay) > >> > > > > > > > > // Software Engineer, Cloudera > >> > > > > > > > > // jon@cloudera.com > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > -- > >> > > > > > > // Jonathan Hsieh (shay) > >> > > > > > > // Software Engineer, Cloudera > >> > > > > > > // jon@cloudera.com > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > // Jonathan Hsieh (shay) > >> > > > > // Software Engineer, Cloudera > >> > > > > // jon@cloudera.com > >> > > > > > >> > > > > >> > > > >> > > >> > > > > > --20cf303dd2feedfdd204d656ae66--