Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71B3DFFFC for ; Sat, 20 Apr 2013 20:12:13 +0000 (UTC) Received: (qmail 54117 invoked by uid 500); 20 Apr 2013 20:12:11 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 54046 invoked by uid 500); 20 Apr 2013 20:12:11 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 54038 invoked by uid 99); 20 Apr 2013 20:12:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Apr 2013 20:12:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul.konyves@gmail.com designates 209.85.219.49 as permitted sender) Received: from [209.85.219.49] (HELO mail-oa0-f49.google.com) (209.85.219.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Apr 2013 20:12:06 +0000 Received: by mail-oa0-f49.google.com with SMTP id j1so2357420oag.8 for ; Sat, 20 Apr 2013 13:11:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=jTXp5O3kdPywrrTjKMNQj0X8rjOfkUzm1TT63sWbyTk=; b=euHtExnbG2Rm4UKSSQwrrSo9aLpWeqUTydahzlZ4uQXT5gMW1o26WZBetFLQ20mOsf kpXBMpWsZ0mKjN6SWkXGCGuAmTxWhbvbhx/L2K+GK48bGURZr+vbBGRCTub0qN4PqXJl w/zuZK8tJcTq6iO9i4SOEXUn+JKMbWSi055hPmCIpKMgxDfBrc+oV1HLOlJiy+sgc3sm ymyxviqSDLApbc6hrfG0RphQeStu2eMBYSzkifiT/6D6NVzcICeXYCNJbEdNWN8pRnLu XMJ1m/DUcAnOmP5e0ot+6DB+XeEi2aIXo7R2Qiisu7Lhm+J6dDygTZ3I2Z0s2qeyF63X yflA== X-Received: by 10.60.37.169 with SMTP id z9mr10985114oej.26.1366488706164; Sat, 20 Apr 2013 13:11:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.173.66 with HTTP; Sat, 20 Apr 2013 13:11:24 -0700 (PDT) In-Reply-To: References: From: Pal Konyves Date: Sat, 20 Apr 2013 22:11:24 +0200 Message-ID: Subject: Re: default region splitting on which value? To: user Content-Type: multipart/alternative; boundary=089e01184d72ebadcb04dad07141 X-Virus-Checked: Checked by ClamAV on apache.org --089e01184d72ebadcb04dad07141 Content-Type: text/plain; charset=UTF-8 Hi Ted, Only one family, my data is very simple key-value, although I want to make sequential scan, so making a hash of the key is not an option. On Sat, Apr 20, 2013 at 10:07 PM, Ted Yu wrote: > How many column families do you have ? > > For #3, per-splitting table at the row keys corresponding to peaks makes > sense. > > On Apr 20, 2013, at 10:52 AM, Pal Konyves wrote: > > > Hi, > > > > I am just reading about region splitting. By default - as I understand - > > Hbase handles splitting the regions. I just don't know how to imagine on > > which key it splits the regions. > > > > 1) For example when I write MD5 hash of rowkeys, they are most probably > > evenly distributed from > > 000000... to FFFFF... right? When Hbase starts with one region, all the > > writes goes into that region, and when the HFile get's too big, it just > > gets for example the median value of the stored keys, and split the > region > > by this? > > > > 2) I want to bulk load tons of data with the HBase java client API put > > operations. I want it to perform well. My keys are numeric sequential > > values (which I know from this post, I cannot load into Hbase > sequentially, > > because the Hbase tables are going to be sad > > > http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/ > > ) > > So I thought I would pre-split the table into regions, and load the data > > randomized. This way I will get good distribution among region servers in > > terms of network IO from the beginning. Is that a good idea? > > > > 3) If my rowkeys are not evenly distributed in the keyspace, but they > show > > some peaks or bursts. e.g. 000-999, but most of the keys gather around > 020 > > and 060 values, is it a good idea to have the pre region splits at those > > peaks? > > > > Thanks in advance, > > Pal > --089e01184d72ebadcb04dad07141--