Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 40642 invoked from network); 22 Mar 2011 19:14:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Mar 2011 19:14:16 -0000 Received: (qmail 72620 invoked by uid 500); 22 Mar 2011 19:14:14 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 72592 invoked by uid 500); 22 Mar 2011 19:14:14 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 72584 invoked by uid 99); 22 Mar 2011 19:14:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2011 19:14:14 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jdcryans@gmail.com designates 209.85.214.169 as permitted sender) Received: from [209.85.214.169] (HELO mail-iw0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2011 19:14:07 +0000 Received: by iwl42 with SMTP id 42so10674646iwl.14 for ; Tue, 22 Mar 2011 12:13:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=DR5zEY/8LNsr0UVA8I/wlIklo/jxDjGgit8/U6/MZWY=; b=voI2FUUSoeLb3UU8BJzC62fPOUH5e436ljqoD7bFVwJeYLWo9JcZee7KTt+C1p6nEl GUZL1BN1ajJwXbIwx7XX954uC+B7N1Aw+g7OXLZAW/tVZXzkBvcDDq4BfF0vuhx3rA5E mo/ryX9wT4X074t/Sm4EHnFIt109rR3Z4YBhs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=TqNlSomb1G/8YdXX9cxL8q2lEaEYnOmBEmngCFxzqeVAIhe6HyWOunbQgy/r90rZrg KRbQ5H62GLjLTXYZRJRLgOfveDFCCo1satYwgGYGt3lPBcVtRxT7klVtYTMQd37GXCvb uUIzifDlNfIA/L4AECdg9LWlrEjyt9DHD3P+8= MIME-Version: 1.0 Received: by 10.43.47.67 with SMTP id ur3mr4243516icb.391.1300821168403; Tue, 22 Mar 2011 12:12:48 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.42.162.197 with HTTP; Tue, 22 Mar 2011 12:12:48 -0700 (PDT) In-Reply-To: References: Date: Tue, 22 Mar 2011 12:12:48 -0700 X-Google-Sender-Auth: 20DzrYrrxseACQAfJjt8qC-Igg0 Message-ID: Subject: Re: Manual Region Splitting Question. From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org bb will fall into the first region since the next start key is ca and bb is smaller than that. J-D On Tue, Mar 22, 2011 at 12:07 PM, Vivek Krishna wro= te: > For eg., lets assume I have keys in range of aa, ab,ac..zz > > Using the sample data I create regions like this > > aa-ba region 1 > ca-da region 2 etc., > > The reason why I did not create region bb-bz because I did not encounter = in > the sample.q > But when I encounter a key like bb, it does not fall in the region I crea= ted > and hence follow the normal procedure I guess. ? > > Also, I have incoming =A0data distributed almost evenly. > Viv > > > > On Tue, Mar 22, 2011 at 2:55 PM, Jean-Daniel Cryans = wrote: > >> It depends if you are also inserting in an ordered fashion right? Even >> if you have regions a through z, but you start inserting only keys >> with starting with "a", then you'll only hit the first regions. >> >> J-D >> >> On Tue, Mar 22, 2011 at 11:46 AM, Vivek Krishna >> wrote: >> > I have GBs of data to be dumped to HBase. =A0After lots of trials and >> reading >> > through the mailing list, I figured out creating regions manually is a >> good >> > option because all data was hitting one node initially... >> > >> > My approach to creating regions is as follow. >> > =A0 =A0- I sampled like about 1% of the actual data and created say 'n= ' >> regions >> > based on this sample. >> > >> > Now while doing the insertions, it still hits one node first and then >> > spreads out. >> > >> > Our theory is that, the key it encounters while inserting does'nt fall= in >> > the region that we created(using the sample) and hence it inserts as i= t >> > would do normally. >> > >> > So, has anyone approached this problem in a smarter way ? >> > >> > Viv >> > >> >