hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Spiegelberg (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter
Date Tue, 11 Oct 2011 23:17:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125474#comment-13125474
] 

Nicolas Spiegelberg commented on HBASE-4489:
--------------------------------------------

1. If you want to save space, you should probably switch to a UINT32 for the key range instead
of the current INT64.  This should scale for up to 2 million regions.

2. We (FB) have this specified for MD5StringSplit elsewhere, but I think an mandatory requirement
if your goal is "better key splitting" is to provide the hash function that, given a user-specified
key, hash it and return what you need to insert into HBase.  Maybe add a generateKey() function
to SplitAlgorithm interface.  Worst case is we make it an identity operation.  

Talked with StumbleUpon & Salesforce guys last week about making this something that users
can enable and the HBase Client would transparently do for them.  However, step one is giving
users the APIs they need to manually do proper key distribution.
                
> Better key splitting in RegionSplitter
> --------------------------------------
>
>                 Key: HBASE-4489
>                 URL: https://issues.apache.org/jira/browse/HBASE-4489
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.4
>            Reporter: Dave Revell
>            Assignee: Dave Revell
>         Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-branch0.90-v2.patch,
HBASE-4489-branch0.90-v3.patch, HBASE-4489-trunk-v1.patch, HBASE-4489-trunk-v2.patch, HBASE-4489-trunk-v3.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the command
line or do a rolling split on an existing table. It supports pluggable split algorithms that
implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes
keys fall in the range from ASCII string "00000000" to ASCII string "7FFFFFFF". This is not
a sane default, and seems useless to most users. Users are likely to be surprised by the fact
that all the region splits occur in in the byte range of ASCII characters.
> A better default split algorithm would be one that evenly divides the space of all bytes,
which is what this patch does. Making a table with five regions would split at \x33\x33...,
\x66\x66...., \x99\x99..., \xCC\xCC..., and \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message