hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2531) 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
Date Wed, 19 May 2010 04:08:54 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868980#action_12868980
] 

stack commented on HBASE-2531:
------------------------------

Yes to Todd suggestion.

Kannan, I'm down w/ your suggesion except for bit where ',' is also the delimiter between
timestamp and dirname.   Use a '.' or something instead.   Special meta region comparator
code looks for the ',' characters dividing up the parts of a meta key doing sorting.  The
extra ',' will throw it off and you'll get a headache trying to sort out how this comparator
works.  it gets really interesting when meta splits. (though currently this is disabled)....
for then you have meta regionnames that look like this:  meta,TestTable,SOMESTARTKEY,TS,TS...
then throw in fact that starkeys can be binary and my sense is that about now you feel a migrane
coming on.

I'm good w/ md5.  128 bits vs 160 bits for sha-1 (which seems overkill).  Or we could keep
jenkins hash -- 32 bits -- because and use timestamp+jenkins_hash naming dir.  A collision
is unlikely?

> 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2531
>                 URL: https://issues.apache.org/jira/browse/HBASE-2531
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.21.0
>
>
> Kannan tripped over two regionnames that hashed the same:
> Here is code demo'ing that his two names hash the same:
> {code}
> package org;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.hbase.util.JenkinsHash;
> public class Testing {
>   public static void main(final String [] args) {
>     System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
>     System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
>   }
>   /**
>    * @param regionName
>    * @return the encodedName
>    */
>   public static int encodeRegionName(final byte [] regionName) {
>     return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0));
>   }
> }
> {code}
> Need new encoding mechanism.  Will need to migrate old regions to new schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message