Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 53375 invoked from network); 22 Mar 2011 18:46:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Mar 2011 18:46:51 -0000 Received: (qmail 12940 invoked by uid 500); 22 Mar 2011 18:46:50 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 12891 invoked by uid 500); 22 Mar 2011 18:46:50 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 12883 invoked by uid 99); 22 Mar 2011 18:46:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2011 18:46:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vivekrishna@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2011 18:46:44 +0000 Received: by qyk30 with SMTP id 30so7195755qyk.14 for ; Tue, 22 Mar 2011 11:46:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:from:date:message-id:subject:to :content-type; bh=UqqRPozUUxX3BddC1TqnZ/hveHjjc3Sb9JrMorFKTT8=; b=UpGECBwUiR+R08Kzv9kqnQjI/NHcPee4E8ekepzYtzMRcftUhPaTU8KXg1gr98Hbdp /RDReT2843e1YK5NXiJWofexACOxjhgv3TOu+clML2/CHXTOS03cqnoFyDjO7wCb5KPV uK2f01Y2rISWhq+S0fBB+F6wg1nTkg4255CBM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=n/9qvGRaw62ZgAOBwI1OfxDWAquHp4gz1KkLEhub0zycUHPHTqAZmeK/LXRceWLDd5 xkMLhyuyS3Bmk+u5JSqK1cxJ91KF06Gz4kUcyegKwKOR8KwhhvDBZqfd9bYN33rGPiEv ZhgasmnqPFYxVDYnkyXHL822qdW+6+iLdE0iE= Received: by 10.224.200.194 with SMTP id ex2mr5129341qab.202.1300819583191; Tue, 22 Mar 2011 11:46:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.36.208 with HTTP; Tue, 22 Mar 2011 11:46:03 -0700 (PDT) From: Vivek Krishna Date: Tue, 22 Mar 2011 14:46:03 -0400 Message-ID: Subject: Manual Region Splitting Question. To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=20cf300fb2832c4c0c049f16a987 --20cf300fb2832c4c0c049f16a987 Content-Type: text/plain; charset=ISO-8859-1 I have GBs of data to be dumped to HBase. After lots of trials and reading through the mailing list, I figured out creating regions manually is a good option because all data was hitting one node initially... My approach to creating regions is as follow. - I sampled like about 1% of the actual data and created say 'n' regions based on this sample. Now while doing the insertions, it still hits one node first and then spreads out. Our theory is that, the key it encounters while inserting does'nt fall in the region that we created(using the sample) and hence it inserts as it would do normally. So, has anyone approached this problem in a smarter way ? Viv --20cf300fb2832c4c0c049f16a987--