Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 670A5113B8 for ; Wed, 27 Aug 2014 17:20:52 +0000 (UTC) Received: (qmail 87089 invoked by uid 500); 27 Aug 2014 17:20:47 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 87014 invoked by uid 500); 27 Aug 2014 17:20:47 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 86996 invoked by uid 99); 27 Aug 2014 17:20:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Aug 2014 17:20:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates 209.85.160.177 as permitted sender) Received: from [209.85.160.177] (HELO mail-yk0-f177.google.com) (209.85.160.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Aug 2014 17:20:20 +0000 Received: by mail-yk0-f177.google.com with SMTP id 79so504177ykr.22 for ; Wed, 27 Aug 2014 10:20:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=nYFi+QuAOM9n0hORNmmC0WDf4YfJY1NrsWuqimaE7lc=; b=iVzcjsDzge/kx8tHK8DHz7jm2dufplejYEsWoJRMGMSqHLbwrOxZPAJOKhTagEhxVg W3009cpPOvjMGnmvBvjFQsXtgheivVl0o0jbwEm+/tRblY40YKdx6OKb7/LU05qIim2n wBm3xON7Hl2zjKC//keBdtKyNoeXog451CPjMnPCLbpU9N5uSuJMV+GQV5w2wdJD5+gW TEfkLb6grRKJJYzEVDla/GBFqYAMo5MPUB556AoGP1WlsUVwpomM2InqgQTaRJ036I00 zpTstXziCF7RXyjAY0hrjbuHP3sPepA4QLDudYR+OeggPocD7hotjpocYpNzpJjZtUVb lP4Q== MIME-Version: 1.0 X-Received: by 10.236.129.205 with SMTP id h53mr55687451yhi.74.1409160019576; Wed, 27 Aug 2014 10:20:19 -0700 (PDT) Received: by 10.170.136.14 with HTTP; Wed, 27 Aug 2014 10:20:19 -0700 (PDT) In-Reply-To: References: Date: Wed, 27 Aug 2014 10:20:19 -0700 Message-ID: Subject: Re: Writing Custom - KeyComparator !!! From: Ted Yu To: sanjiv singh Cc: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=20cf301afd97660e6105019fa223 X-Virus-Checked: Checked by ClamAV on apache.org --20cf301afd97660e6105019fa223 Content-Type: text/plain; charset=UTF-8 A brief search for KeyComparator using http://search-hadoop.com/ didn't turn up previous discussion on using custom KeyComparator. I would suggest conforming to best practices of row key design and leaving custom KeyComparator as last resort. Cheers On Wed, Aug 27, 2014 at 9:24 AM, @Sanjiv Singh wrote: > Hi Ted, > > Yes definitely, i can make it as Fixed country code. > > The example i choose is just one of the use-case of specific ordering > need. I am thinking of if we can use any user object as row-key and > ordering of rows within HBase are defined explicitly by Custom > KeyComparator. > > > > > > > > Regards > Sanjiv Singh > Mob : +091 9990-447-339 > > > On Wed, Aug 27, 2014 at 9:20 PM, Ted Yu wrote: > >> Sanjiv: >> Is there a reason for you to choose full country name ? >> Row key would be stored for every KeyValue in the same row, choosing >> abbreviation would reduce storage cost. >> >> Cheers >> >> >> On Wed, Aug 27, 2014 at 8:38 AM, @Sanjiv Singh >> wrote: >> >>> Hi Ted, >>> >>> Yes it would work for country code like IND for 'india' , AUS for >>> australia. >>> >>> But in my use-case, It's full country name ( not just three alphabet >>> country code). >>> >>> Regards >>> Sanjiv Singh >>> Mob : +091 9990-447-339 >>> >>> >>> On Wed, Aug 27, 2014 at 8:34 PM, Ted Yu wrote: >>> >>>> Sanjiv: >>>> Is country code of fixed width ? >>>> >>>> If so, as long as country is the prefix, it would be sorted first. >>>> >>>> Cheers >>>> >>>> >>>> On Wed, Aug 27, 2014 at 8:00 AM, @Sanjiv Singh >>>> wrote: >>>> >>>>> Hi JM, >>>>> >>>>> Thanks for link... I agree with you that i can be done when key is an >>>>> integer. >>>>> >>>>> Reason why i am asking for custom KeyComparator is that Something key >>>>> is >>>>> not just integer or some value , it can be of composition of multiple >>>>> values like where key is made up of two values, one is >>>>> COUNTRY and other is CITY. >>>>> >>>>> The way i wanted to order first them by COUNTRY , then by CITY. >>>>> >>>>> How can we do the same ? >>>>> >>>>> >>>>> Hope that I have taken correct example, emphasizes my use-case. >>>>> >>>>> >>>>> Regards >>>>> Sanjiv Singh >>>>> Mob : +091 9990-447-339 >>>>> >>>>> >>>>> On Wed, Aug 27, 2014 at 5:42 PM, Jean-Marc Spaggiari < >>>>> jean-marc@spaggiari.org> wrote: >>>>> >>>>> > Hi Sanjiv!!!! ;) >>>>> > >>>>> > If you want your keys to be ordered as Integers, why do you not >>>>> simply >>>>> > store them as Integers and not as Strings? HBase order the rows >>>>> > alphabetically, and you can not change that. Yes you can implement a >>>>> key >>>>> > comparator if you want but I don't think it's going to change >>>>> anything to >>>>> > this situation. >>>>> > >>>>> > You might want to take a look at this: >>>>> > http://hbase.apache.org/book/rowkey.design.html >>>>> > >>>>> > Just put your values that way: >>>>> > >>>>> > int myKey = 22000; >>>>> > Put put = new Put(Bytes.toBytes(myKey)); >>>>> > >>>>> > And that will solve your ordering problem. >>>>> > >>>>> > JM >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > 2014-08-27 6:09 GMT-04:00 @Sanjiv Singh : >>>>> > >>>>> >> Hi All, >>>>> >> >>>>> >> As we know, All rows are always sorted lexicographically by their >>>>> row >>>>> >> key. >>>>> >> In lexicographical order, each key is compared at binary level, >>>>> byte by >>>>> >> byte and from left to right. >>>>> >> >>>>> >> See the example below , where row key is some integer value and >>>>> output of >>>>> >> scan show lexicographical order of rows in table. >>>>> >> >>>>> >> hbase(main):001:0> scan 'table1' >>>>> >> ROW COLUMN+CELL >>>>> >> 1 column=cf1:, timestamp=1297073325971 ... >>>>> >> 11 column=cf 1:, timestamp=1297073337383 ... >>>>> >> 11000 column=cf1 :, timestamp=1297073340493 ... >>>>> >> 2 column=cf1:, timestamp=1297073329851 ... >>>>> >> 22 column=cf1:, timestamp=1297073344482 ... >>>>> >> 22000 column=cf1:, timestamp=1297073333504 ... >>>>> >> 23 column=cf1:, timestamp=1297073349875 ... >>>>> >> >>>>> >> I want to see these rows ordered as integer, not the default way. I >>>>> can >>>>> >> pad >>>>> >> keys with '0' to get a proper sorting order(i don't like it). >>>>> >> >>>>> >> I wanted to see these rows sorted as integer , not just as output >>>>> of scan >>>>> >> OR get method , but also to store rows with consecutive integer row >>>>> keys >>>>> >> in >>>>> >> same block. >>>>> >> >>>>> >> Now the question is : >>>>> >> >>>>> >> - Can we define our own custom KeyComparator ? >>>>> >> - If Yes , can we enforce it for PUT method ? so that rows >>>>> would be >>>>> >> stored as new KeyComparator. >>>>> >> - Can we plug this comparator duriong SCAN method to change >>>>> order of >>>>> >> >>>>> >> result rows ? >>>>> >> >>>>> >> Hope, i have explained the proplem well, seeking for your valuable >>>>> >> response on it. >>>>> >> >>>>> >> >>>>> >> Regards >>>>> >> Sanjiv Singh >>>>> >> Mob : +091 9990-447-339 >>>>> >> >>>>> > >>>>> > >>>>> >>>> >>>> >>> >> > --20cf301afd97660e6105019fa223--