Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 89224 invoked from network); 27 Jul 2009 18:59:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Jul 2009 18:59:22 -0000 Received: (qmail 24029 invoked by uid 500); 27 Jul 2009 19:00:27 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 23973 invoked by uid 500); 27 Jul 2009 19:00:27 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 23963 invoked by uid 99); 27 Jul 2009 19:00:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jul 2009 19:00:27 +0000 X-ASF-Spam-Status: No, hits=-0.3 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of fern@alum.mit.edu does not designate 66.111.4.25 as permitted sender) Received: from [66.111.4.25] (HELO out1.smtp.messagingengine.com) (66.111.4.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jul 2009 19:00:16 +0000 Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.messagingengine.com (Postfix) with ESMTP id 2482F3BCFD3 for ; Mon, 27 Jul 2009 14:59:55 -0400 (EDT) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by compute1.internal (MEProxy); Mon, 27 Jul 2009 14:59:55 -0400 X-Sasl-enc: V7bNKcXrJljPfT8aQiMThftvsYImR7vl7ueL7R2kp9uA 1248721194 Received: from [10.0.7.180] (unknown [63.202.1.94]) by mail.messagingengine.com (Postfix) with ESMTPSA id CAEBDD042 for ; Mon, 27 Jul 2009 14:59:54 -0400 (EDT) Message-ID: <4A6DF99A.1020107@alum.mit.edu> Date: Mon, 27 Jul 2009 12:01:46 -0700 From: Fernando Padilla User-Agent: Thunderbird 2.0.0.21 (X11/20090318) MIME-Version: 1.0 To: "hbase-user@hadoop.apache.org" Subject: key hashing? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org So I will be generating lots of rows into the db keyed by userId, in userId order. I have already learned through this mailing list that this use-case is not ideal, since it would mean most row-inserts will be on one region server. I know that some people suggest to add some randomization to the keys so that it will be spread around, but I can't do that, since I'll need to be able to do random access lookup on the rows via userId. But I'm wondering if I could map/hash the real userId, into another number that will spread around the inserts. And I can still do random access lookups given a real userId, by calculating the hash.. 1) i think i like this idea, does anyone have any experience with this? 2) assume userId is a 8byte long, what would be some good hashing functions? I would be lazy and use little-endian, but I bet one of you could come up with something better. :)