hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Revell (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter
Date Tue, 11 Oct 2011 23:41:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125481#comment-13125481
] 

Dave Revell commented on HBASE-4489:
------------------------------------

@Nicolas,

In response to your #1:
> "If you want to save space, you should probably switch to a UINT32 for the key range
instead of the current INT64. This should scale for up to 2 million regions."

Users can use whatever length of keys they want. RegionSplitter just chooses the region boundaries,
which has no effect on space consumption.

In response to your #2:
Every cryptographic hash function distributes its values uniformly across the space of byte
strings of length N. So that makes UniformSplit a sensible default when using MD5 hashes or
SHA1 hashes or whatever else, as long as they're not converted to ASCII. 

The goal here is sane default behavior that makes sense for typical use cases. Evenly dividing
the key space accomplishes that goal. At least that's *my* goal.
                
> Better key splitting in RegionSplitter
> --------------------------------------
>
>                 Key: HBASE-4489
>                 URL: https://issues.apache.org/jira/browse/HBASE-4489
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.4
>            Reporter: Dave Revell
>            Assignee: Dave Revell
>         Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-branch0.90-v2.patch,
HBASE-4489-branch0.90-v3.patch, HBASE-4489-trunk-v1.patch, HBASE-4489-trunk-v2.patch, HBASE-4489-trunk-v3.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the command
line or do a rolling split on an existing table. It supports pluggable split algorithms that
implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes
keys fall in the range from ASCII string "00000000" to ASCII string "7FFFFFFF". This is not
a sane default, and seems useless to most users. Users are likely to be surprised by the fact
that all the region splits occur in in the byte range of ASCII characters.
> A better default split algorithm would be one that evenly divides the space of all bytes,
which is what this patch does. Making a table with five regions would split at \x33\x33...,
\x66\x66...., \x99\x99..., \xCC\xCC..., and \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message