Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC818DFAF for ; Mon, 28 Jan 2013 11:52:06 +0000 (UTC) Received: (qmail 93803 invoked by uid 500); 28 Jan 2013 11:52:04 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 93535 invoked by uid 500); 28 Jan 2013 11:52:04 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 93519 invoked by uid 99); 28 Jan 2013 11:52:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jan 2013 11:52:04 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of amits@infolinks.com designates 207.126.144.141 as permitted sender) Received: from [207.126.144.141] (HELO eu1sys200aog116.obsmtp.com) (207.126.144.141) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 28 Jan 2013 11:51:57 +0000 Received: from mail-we0-f198.google.com ([74.125.82.198]) (using TLSv1) by eu1sys200aob116.postini.com ([207.126.147.11]) with SMTP ID DSNKUQZmR6JgTJNvt12rchetoyZ9e5PnIQFM@postini.com; Mon, 28 Jan 2013 11:51:36 UTC Received: by mail-we0-f198.google.com with SMTP id k14so2419198wer.1 for ; Mon, 28 Jan 2013 03:51:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=E4dUUTEHyO5TRyJ8fPvRQYhkWWZyS99/ob42uCDGdCk=; b=j9m/f3bZheiPh7lwocxgdQANZ9XdLRHH9hewcfkqsATtg6WrPwPUlJ+l0LOlvmQO2P nk0IPJIUddTF8ICbeYh1QbEqZ9f1kcvwV90kIQJaHQT4g4UlqX8lnlzROSht19SNFFSw 5iBHIrByt9QvGnqStUEhraubWwL8Mcx83LFIfAt1R8tqX4G7Y2+VlzQncNxNHxACu/aS khZZL6yMMGBsUuvpuPi6T6gNY4tiATI8L9Rgxn9QANy6PAQb9AYfnv0Xq8ugIjKBB3AE +DNKBN2aLxY90LSThKpwddl1MJr1AWvp8HxztAgwGrmuhar3B6duaJn1DDY27hNWwgBx 3S0Q== X-Received: by 10.112.8.231 with SMTP id u7mr5379688lba.45.1359373894898; Mon, 28 Jan 2013 03:51:34 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.112.8.231 with SMTP id u7mr5379648lba.45.1359373893685; Mon, 28 Jan 2013 03:51:33 -0800 (PST) Received: by 10.114.21.4 with HTTP; Mon, 28 Jan 2013 03:51:33 -0800 (PST) In-Reply-To: References: Date: Mon, 28 Jan 2013 13:51:33 +0200 Message-ID: Subject: Re: Pre-split Region Boundaries From: Amit Sela To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=e0cb4efe2a640cd37a04d457e67a X-Gm-Message-State: ALoCoQnsBNnyz6fDI5qwJKB3081wydGENsRAtjWx52JSouvEhpOmEWGKe6f/jN3YMsGeMxMLjo+dTJrYLXfzug5VPPWTKLLz7w4WQFWmSlW7ON0mx01HyZlG2v01ExFaLR2hkFfgKt9oMdIwz9Gf7a0rbPkpdGLZtnN5iC+lPaFxD4tmstIxYSI= X-Virus-Checked: Checked by ClamAV on apache.org --e0cb4efe2a640cd37a04d457e67a Content-Type: text/plain; charset=ISO-8859-1 We are pre-splitting our tables before bulk loading also but we don't use the RegionSplitter. We split manually (we did some testing and found the optimal split points) by putting into .META table a new HRegionInfo, assigning that region (HBaseAdmin.assign("region name")) and after you finish assigning all the regions don't forget to clear the region cache. I know it's a little bit "intrusive" but it works for us. On Fri, Jan 25, 2013 at 4:45 PM, Rob Styles wrote: > Hi, > > I'm tuning hbase for storage of a few billion rows and, more or less, bulk > loading. > > I'm using MD5 strings as row ids to create an evenly distributed range and > non-sequential values during loading and this is working relatively well > for us. > > I've pre-split my tables using org.apache.hadoop.hbase.util.RegionSplitter > from the command line and had expected it to create regions covering 00000 > - fffff as per the docs. My regions come out different though, before > loading any data. > > With 200 regions the first region ends with 00a3d70a and the regions go up > from there. The last region has a start key of 7f5c28c6 which is only > half-way through the address space. This means my last region gets hot > during loading. > > I know I must have missed something but not sure what. Any help greatly > appreciated. > > thanks > > rob > --e0cb4efe2a640cd37a04d457e67a--