Return-Path: Delivered-To: apmail-hadoop-hbase-issues-archive@minotaur.apache.org Received: (qmail 26075 invoked from network); 18 May 2010 05:11:09 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 May 2010 05:11:09 -0000 Received: (qmail 1683 invoked by uid 500); 18 May 2010 05:11:09 -0000 Delivered-To: apmail-hadoop-hbase-issues-archive@hadoop.apache.org Received: (qmail 1636 invoked by uid 500); 18 May 2010 05:11:09 -0000 Mailing-List: contact hbase-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hbase-issues@hadoop.apache.org Received: (qmail 1624 invoked by uid 99); 18 May 2010 05:11:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 May 2010 05:11:08 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 May 2010 05:11:06 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o4I5AiHr011799 for ; Tue, 18 May 2010 05:10:44 GMT Message-ID: <11020139.99001274159444277.JavaMail.jira@thor> Date: Tue, 18 May 2010 01:10:44 -0400 (EDT) From: "stack (JIRA)" To: hbase-issues@hadoop.apache.org Subject: [jira] Commented: (HBASE-2531) 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868534#action_12868534 ] stack commented on HBASE-2531: ------------------------------ @Kannan As I see it, we just need to make sure that the system can work with both styles of naming; that it reads in the old stuff without issue and that at same time, for any new region created, we should use the UUID form writing new region directory names. Somehow, we also need to drop this notion of encoding the region name. Going forward it will not be needed since the UUID will actually be part of the region name. I agree that getting zk in the mix or even hdfs for that matter making region dirctory names complicates something that could be real simple if we use UUIDs. > 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes > ---------------------------------------------------------------------------- > > Key: HBASE-2531 > URL: https://issues.apache.org/jira/browse/HBASE-2531 > Project: Hadoop HBase > Issue Type: Bug > Reporter: stack > Assignee: stack > Priority: Blocker > Fix For: 0.21.0 > > > Kannan tripped over two regionnames that hashed the same: > Here is code demo'ing that his two names hash the same: > {code} > package org; > import org.apache.hadoop.hbase.util.Bytes; > import org.apache.hadoop.hbase.util.JenkinsHash; > public class Testing { > public static void main(final String [] args) { > System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167"))); > System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201"))); > } > /** > * @param regionName > * @return the encodedName > */ > public static int encodeRegionName(final byte [] regionName) { > return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0)); > } > } > {code} > Need new encoding mechanism. Will need to migrate old regions to new schema. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.