hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@fb.com>
Subject RE: region, regionserver questions
Date Thu, 02 Dec 2010 23:10:09 GMT
Yeah, I'd recommend just using the normal TIF which will have a map task per region, attempts
to schedule it on that node, and each task would talk to only one (hopefully local) server.

As for assignment, the story has changed significantly between previous versions and the upcoming
0.90 release.

In 0.90, there are two modes of startup assignment.  The new default is 'retain assignment'
where the master will attempt to reuse whatever the last set of assignments were on the previous
run of the cluster.  The other option, if you turn off retain assignment, is round-robin.
 This round-robin assignment would give you what you want (an approximately equal number of
regions of each table on each server).

What I've done to get good distribution of the tables is startup with round-robin, then from
then on use retain assignment.


> -----Original Message-----
> From: Sean Sechrist [mailto:ssechrist@gmail.com]
> Sent: Thursday, December 02, 2010 2:50 PM
> To: user@hbase.apache.org
> Subject: Re: region, regionserver questions
> Hey Albert,
> If you use TableInputFormat, it will create one map task per region in that
> table. So, each mapper should just talk to one regionserver.
> -Sean
> On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <ashau@yahoo-inc.com> wrote:
> > Hi,
> >
> > I'm doing a distributed scan of an hbase table using map-reduce by taking
> > all the regions belonging to a regionserver, and then assigning those
> > regions to a mapper (so there's 1 mapper per regionserver, and each
> mapper
> > only talks to one regionserver).  However, doing it this way I'm getting
> > some data skew.  For example, I have 2 tables U and T.  Each regionserver
> > may have 30 regions, but one regionserver might have 10 regions from
> table U
> > while another regionserver might have 25 regions from table U.  Is there
> a
> > way to balance regions per table per regionserver (so that each
> regionserver
> > has 15 regions from table U for example)?  Or should I just not worry
> about
> > trying to have each individual mapper only talk to one regionserver?
> >
> > Also, how do regions get assigned to regionservers?  Is it based on data
> > locality?  Region start/end keys?  Randomly?
> >
> > Thanks,
> > Albert
> >

View raw message