hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2531) 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
Date Sat, 29 May 2010 01:52:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873220#action_12873220
] 

HBase Review Board commented on HBASE-2531:
-------------------------------------------

Message from: "Kannan Muthukkaruppan" <kannan@facebook.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/104/
-----------------------------------------------------------

(Updated 2010-05-28 18:50:19.120776)


Review request for hbase.


Changes
-------

Forgot to include one new file (hbase/util/MD5Hash.java).

BTW, also manually ran a test with a cluster created using an older HBase -- with regions
in old format. Then restarted cluster with new HBase. Newly created regions were in new format
as expected. And over time, as old regions, split they migrated over to the new format.


Summary
-------

The new format for a region name contains its encodedName. The encoded name also serves as
the directory name for the region in the filesystem.

New region name format:

      <tablename>,<startkey>,<regionIdTimestamp>/<encodedName>/

where, <encodedName> is a hex version of the MD5 hash of <tablename>,<startkey>,<regionIdTimestamp>
 
The old region name format remains:
     <tablename>,<startkey>,<regionIdTimestamp>

For region names in the old format, the encoded name is a 32-bit JenkinsHash integer value
(in its decimal notation, string form). 

**NOTE**
  
ROOT, the first META region, and regions created by an older version of HBase (0.20 or prior)
will continue to use the old region name format.


In the logs & web ui, old format region names will show up as:
   <tablename>,<startkey>,<regionIdTimestamp>(<jenkinshashEncodedName>)
New format region names will show up as:
    <tablename>,<startkey>,<regionIdTimestamp>/<md5hashEncodedName>/


This addresses bug HBASE-2531.


Diffs (updated)
-----

  trunk/bin/add_table.rb 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/util/MD5Hash.java PRE-CREATION 
  trunk/src/main/resources/hbase-webapps/master/table.jsp 949322 
  trunk/src/main/resources/hbase-webapps/regionserver/regionserver.jsp 949322 
  trunk/src/test/java/org/apache/hadoop/hbase/TestEmptyMetaInfo.java 949322 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 949322 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestGetClosestAtOrBefore.java 949322

  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java 949322 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 949322 

Diff: http://review.hbase.org/r/104/diff


Testing
-------

unit tests pass. ran some cluster tests, and things seemed to work ok. Yet to try some migration
test (upgrading from an older server).


Thanks,

Kannan




> 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2531
>                 URL: https://issues.apache.org/jira/browse/HBASE-2531
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Kannan Muthukkaruppan
>            Priority: Blocker
>             Fix For: 0.21.0
>
>
> Kannan tripped over two regionnames that hashed the same:
> Here is code demo'ing that his two names hash the same:
> {code}
> package org;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.hbase.util.JenkinsHash;
> public class Testing {
>   public static void main(final String [] args) {
>     System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
>     System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
>   }
>   /**
>    * @param regionName
>    * @return the encodedName
>    */
>   public static int encodeRegionName(final byte [] regionName) {
>     return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0));
>   }
> }
> {code}
> Need new encoding mechanism.  Will need to migrate old regions to new schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message