hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2531) 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
Date Tue, 18 May 2010 23:20:54 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868891#action_12868891

stack commented on HBASE-2531:


Some good stuff related to this issue came up on IRC this afternoon.  In particular, we probably
want region names to sort such that children of splits are inserted after their parent.  One
of the childs of a split will have same startkey as the parent, the one that is carrying the
lower range of the split.  The only differentiator is the 3rd part of the regionname where
regionnames are formatted as

<tablename> ',' <startkey> ',' <timestamp>

.... and since it currently timestamp, the child will go into META after the parent (Things
should still work even if child goes in before parent but of that I'm not certain).

So, we probably want to keep this attribute of regionnames.

This would seem to rule out UUID as the 3rd component of regionnames since they are effectively
'random' (version 1 was time-based but as Kannan points out, they are prefixed with MAC address
and anyways, java doesn't do version 1 UUID).

> 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
> ----------------------------------------------------------------------------
>                 Key: HBASE-2531
>                 URL: https://issues.apache.org/jira/browse/HBASE-2531
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.21.0
> Kannan tripped over two regionnames that hashed the same:
> Here is code demo'ing that his two names hash the same:
> {code}
> package org;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.hbase.util.JenkinsHash;
> public class Testing {
>   public static void main(final String [] args) {
>     System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
>     System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
>   }
>   /**
>    * @param regionName
>    * @return the encodedName
>    */
>   public static int encodeRegionName(final byte [] regionName) {
>     return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0));
>   }
> }
> {code}
> Need new encoding mechanism.  Will need to migrate old regions to new schema.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message