Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 69385 invoked from network); 3 Dec 2010 03:45:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Dec 2010 03:45:21 -0000 Received: (qmail 40853 invoked by uid 500); 3 Dec 2010 03:45:20 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 40784 invoked by uid 500); 3 Dec 2010 03:45:20 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 40775 invoked by uid 99); 3 Dec 2010 03:45:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Dec 2010 03:45:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.161.41 as permitted sender) Received: from [209.85.161.41] (HELO mail-fx0-f41.google.com) (209.85.161.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Dec 2010 03:45:15 +0000 Received: by fxm13 with SMTP id 13so6467369fxm.14 for ; Thu, 02 Dec 2010 19:44:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=RXUb7nhF+kU2ZxjvjyLvGZnRoYNwjzJDFgRqiNmuTg8=; b=swf2HvKb+NMS3Ff3qdQh9z545KG/8IRgaZiGenEBukYmEiVys/eJoqXuoKGAoO3vwW TvGfYC5DPk1qL8YVhBxTjVC3PR11HGmhoayp82AYLZjc1ILJo8YXlxRiSq4CTTJEWEYa 6TC+x1n0JU/NajANfw9tlVJYo8pMBnngt5SkI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Cj/4Wiy/Y+cSNIuEoRS0yZzImcHPApGt2XREct4Xa3N6V2Bk4LQVRTUzsE2qUgwOK+ YUyO2YY1gsAdwnOHDbK6TjajAiBrkQ2chlEkpKdDMxCDeqe2Y3gLq1cZ+ZwM+w+fHjmy PCtrldgj8YzFZKvtZQx/boRvQ39rBZ9jurFv4= MIME-Version: 1.0 Received: by 10.223.100.4 with SMTP id w4mr718423fan.26.1291347893703; Thu, 02 Dec 2010 19:44:53 -0800 (PST) Received: by 10.223.83.200 with HTTP; Thu, 2 Dec 2010 19:44:53 -0800 (PST) In-Reply-To: <5A76F6CE309AD049AAF9A039A39242820F1A5850@sc-mbx04.TheFacebook.com> References: <996FE584DE9DE642B04DF887487D18F001E34AD0F9@SP2-EX07VS01.ds.corp.yahoo.com> <5A76F6CE309AD049AAF9A039A39242820F1A5850@sc-mbx04.TheFacebook.com> Date: Thu, 2 Dec 2010 19:44:53 -0800 Message-ID: Subject: Re: region, regionserver questions From: Ted Yu To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=20cf30433e327c56b90496795c1d --20cf30433e327c56b90496795c1d Content-Type: text/plain; charset=ISO-8859-1 When a table is created with N regions, is it possible to distribute them (almost) equally among the region servers ? Thanks On Thu, Dec 2, 2010 at 3:10 PM, Jonathan Gray wrote: > Yeah, I'd recommend just using the normal TIF which will have a map task > per region, attempts to schedule it on that node, and each task would talk > to only one (hopefully local) server. > > As for assignment, the story has changed significantly between previous > versions and the upcoming 0.90 release. > > In 0.90, there are two modes of startup assignment. The new default is > 'retain assignment' where the master will attempt to reuse whatever the last > set of assignments were on the previous run of the cluster. The other > option, if you turn off retain assignment, is round-robin. This round-robin > assignment would give you what you want (an approximately equal number of > regions of each table on each server). > > What I've done to get good distribution of the tables is startup with > round-robin, then from then on use retain assignment. > > JG > > > -----Original Message----- > > From: Sean Sechrist [mailto:ssechrist@gmail.com] > > Sent: Thursday, December 02, 2010 2:50 PM > > To: user@hbase.apache.org > > Subject: Re: region, regionserver questions > > > > Hey Albert, > > > > If you use TableInputFormat, it will create one map task per region in > that > > table. So, each mapper should just talk to one regionserver. > > > > -Sean > > > > On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau wrote: > > > > > Hi, > > > > > > I'm doing a distributed scan of an hbase table using map-reduce by > taking > > > all the regions belonging to a regionserver, and then assigning those > > > regions to a mapper (so there's 1 mapper per regionserver, and each > > mapper > > > only talks to one regionserver). However, doing it this way I'm > getting > > > some data skew. For example, I have 2 tables U and T. Each > regionserver > > > may have 30 regions, but one regionserver might have 10 regions from > > table U > > > while another regionserver might have 25 regions from table U. Is > there > > a > > > way to balance regions per table per regionserver (so that each > > regionserver > > > has 15 regions from table U for example)? Or should I just not worry > > about > > > trying to have each individual mapper only talk to one regionserver? > > > > > > Also, how do regions get assigned to regionservers? Is it based on > data > > > locality? Region start/end keys? Randomly? > > > > > > Thanks, > > > Albert > > > > --20cf30433e327c56b90496795c1d--