hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Writing Custom - KeyComparator !!!
Date Wed, 27 Aug 2014 17:20:19 GMT
A brief search for KeyComparator using http://search-hadoop.com/ didn't
turn up previous discussion on using custom KeyComparator.
I would suggest conforming to best practices of row key design and
leaving custom
KeyComparator as last resort.

Cheers


On Wed, Aug 27, 2014 at 9:24 AM, @Sanjiv Singh <sanjiv.is.on@gmail.com>
wrote:

> Hi Ted,
>
> Yes definitely, i can  make it as Fixed country code.
>
> The example i choose is just one of the use-case of specific ordering
> need.   I am thinking of if we can use any user object as row-key and
> ordering of rows within HBase are defined explicitly  by Custom
> KeyComparator.
>
>
>
>
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
>
> On Wed, Aug 27, 2014 at 9:20 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Sanjiv:
>> Is there a reason for you to choose full country name ?
>> Row key would be stored for every KeyValue in the same row, choosing
>> abbreviation would reduce storage cost.
>>
>> Cheers
>>
>>
>> On Wed, Aug 27, 2014 at 8:38 AM, @Sanjiv Singh <sanjiv.is.on@gmail.com>
>> wrote:
>>
>>> Hi Ted,
>>>
>>> Yes it would work for country code like IND for 'india' , AUS for
>>> australia.
>>>
>>> But in my use-case, It's full country name ( not just three alphabet
>>> country code).
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>>
>>> On Wed, Aug 27, 2014 at 8:34 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>
>>>> Sanjiv:
>>>> Is country code of fixed width ?
>>>>
>>>> If so, as long as country is the prefix, it would be sorted first.
>>>>
>>>> Cheers
>>>>
>>>>
>>>> On Wed, Aug 27, 2014 at 8:00 AM, @Sanjiv Singh <sanjiv.is.on@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi JM,
>>>>>
>>>>> Thanks for link... I agree with you that i can be done when key is an
>>>>> integer.
>>>>>
>>>>> Reason why i am asking for custom KeyComparator is that  Something key
>>>>> is
>>>>> not just integer or some value , it can be of composition of multiple
>>>>> values  like <COUNTRY><CITY> where key is made up of two
values, one is
>>>>> COUNTRY and other is CITY.
>>>>>
>>>>> The way i wanted to order first them by COUNTRY , then by CITY.
>>>>>
>>>>> How can we do the same ?
>>>>>
>>>>>
>>>>> Hope that I have taken correct example, emphasizes my use-case.
>>>>>
>>>>>
>>>>> Regards
>>>>> Sanjiv Singh
>>>>> Mob :  +091 9990-447-339
>>>>>
>>>>>
>>>>> On Wed, Aug 27, 2014 at 5:42 PM, Jean-Marc Spaggiari <
>>>>> jean-marc@spaggiari.org> wrote:
>>>>>
>>>>> > Hi Sanjiv!!!! ;)
>>>>> >
>>>>> > If you want your keys to be ordered as Integers, why do you not
>>>>> simply
>>>>> > store them as Integers and not as Strings? HBase order the rows
>>>>> > alphabetically, and you can not change that. Yes you can implement
a
>>>>> key
>>>>> > comparator if you want but I don't think it's going to change
>>>>> anything to
>>>>> > this situation.
>>>>> >
>>>>> > You might want to take a look at this:
>>>>> > http://hbase.apache.org/book/rowkey.design.html
>>>>> >
>>>>> > Just put your values that way:
>>>>> >
>>>>> >       int myKey = 22000;
>>>>> >       Put put = new Put(Bytes.toBytes(myKey));
>>>>> >
>>>>> > And that will solve your ordering problem.
>>>>> >
>>>>> > JM
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > 2014-08-27 6:09 GMT-04:00 @Sanjiv Singh <sanjiv.is.on@gmail.com>:
>>>>> >
>>>>> >>  Hi All,
>>>>> >>
>>>>> >> As we know,  All rows are always sorted lexicographically by
their
>>>>> row
>>>>> >> key.
>>>>> >> In lexicographical order, each key is compared at binary level,
>>>>> byte by
>>>>> >> byte and from left to right.
>>>>> >>
>>>>> >> See the example below , where row key is some integer value
and
>>>>> output of
>>>>> >> scan show lexicographical order of rows in table.
>>>>> >>
>>>>> >> hbase(main):001:0> scan 'table1'
>>>>> >> ROW        COLUMN+CELL
>>>>> >> 1               column=cf1:, timestamp=1297073325971 ...
>>>>> >> 11             column=cf 1:, timestamp=1297073337383 ...
>>>>> >> 11000        column=cf1 :, timestamp=1297073340493 ...
>>>>> >> 2               column=cf1:, timestamp=1297073329851 ...
>>>>> >> 22             column=cf1:, timestamp=1297073344482 ...
>>>>> >> 22000        column=cf1:, timestamp=1297073333504 ...
>>>>> >> 23             column=cf1:, timestamp=1297073349875 ...
>>>>> >>
>>>>> >> I want to see these rows ordered as integer, not the default
way. I
>>>>> can
>>>>> >> pad
>>>>> >> keys with '0' to get a proper sorting order(i don't like it).
>>>>> >>
>>>>> >> I wanted to see these rows sorted as integer , not just as output
>>>>> of scan
>>>>> >> OR get method , but also to store rows with consecutive integer
row
>>>>> keys
>>>>> >> in
>>>>> >> same block.
>>>>> >>
>>>>> >> Now the question is :
>>>>> >>
>>>>> >>    - Can we define our own custom KeyComparator ?
>>>>> >>    - If Yes , can we enforce it for PUT method ?  so that rows
>>>>> would be
>>>>> >>    stored as new KeyComparator.
>>>>> >>    - Can we plug this comparator duriong SCAN method to change
>>>>> order of
>>>>> >>
>>>>> >>    result rows ?
>>>>> >>
>>>>> >> Hope, i have explained the proplem well,  seeking for your valuable
>>>>> >> response on it.
>>>>> >>
>>>>> >>
>>>>> >> Regards
>>>>> >> Sanjiv Singh
>>>>> >> Mob :  +091 9990-447-339
>>>>> >>
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message