hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: Hash keys
Date Wed, 16 Mar 2011 10:38:58 GMT
Hi Eric,

Mozilla Socorro uses an approach where they bucket ranges using
leading hashes to distribute them across servers. When you want to do
scans you need to create N scans, where N is the number of hashes and
then do a next() on each scanner, putting all KVs into one sorted list
(use the KeyComparator for example) while stripping the prefix hash
first. You can then access the rows in sorted order where the first
element in the list is the one with the first key to read. Once you
took of the first element (being the lowest KV key) you next the
underlying scanner and reinsert it into the list, reordering it. You
keep taking from the top and therefore always see the entire range,
even if the same scanner would return the next logical rows to read.

The shell is written in JRuby, so any function you can use there would
make sense to use in the prefix, then you could compute it on the fly.
This will not help with merging the bucketed key ranges, you need to
do this with the above approach in code. Though since this is JRuby
you could write that code in Ruby and add it to you local shell giving
you what you need.

Lars

On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
<eric.charles@u-mangate.com> wrote:
> Oops, forget my first question about range query (if keys are hashed, they
> can not be queried based on a range...)
> Still curious to have info on hash function in shell shell (2.) and advice
> on md5/jenkins/sha1 (3.)
> Tks,
> Eric
>
> On 16/03/2011 09:52, Eric Charles wrote:
>>
>> Hi,
>>
>> To help avoid hotspots, I'm planning to use hashed keys in some tables.
>>
>> 1. I wonder if this strategy is adviced for range queries (from/to key)
>> use case, because the rows will be randomly distributed in different
>> regions. Will it cause some performance loose?
>> 2. Is it possible to query from hbase shell with something like "get 't1',
>> @hash('r1')", to let the shell compute the hash for you from the readable
>> key.
>> 3. There are MD5 and Jenkins classes in hbase.util package. What would you
>> advice? what about SHA1?
>>
>> Tks,
>> - Eric
>>
>> PS: I searched the archive but didn't find the answers.
>>
>
>

Mime
View raw message