hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: How are split files distributed across Region servers?
Date Tue, 19 Aug 2014 09:43:47 GMT
Ok, I found some reference. I was actually asking the default load balancer
of HBase. And by googling, it seems it only makes the number of regions
even across region servers, but the distribution of regions are random.

Also found good load balancer implementation like this:


https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.html

Thanks for the help JM! :)

Jianshi


On Tue, Aug 19, 2014 at 2:31 PM, lars hofhansl <larsh@apache.org> wrote:

> I'd change the max file size to 20GB. That'd give you 5000 regions for
> 100TB.
>
>
>
> ________________________________
>  From: Jianshi Huang <jianshi.huang@gmail.com>
> To: user@hbase.apache.org
> Sent: Monday, August 18, 2014 12:22 PM
> Subject: Re: How are split files distributed across Region servers?
>
>
> Hi JM,
>
> Make the range bigger you mean to make it multiple regions/splits, right?
>
> I probably will have >100TB of data, and I think the default split file
> size is 10GB. So I can assume each of my 100 machines will get assigned to
> 100 *random* regions?
>
> Where can I find the implementation details or settings for region
> assignment?
>
> Jianshi
>
>
>
> On Mon, Aug 18, 2014 at 8:48 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Jianshi,
> >
> > A region server can host more than one region. So if you pre-split your
> > table correctly based on your access usage, at the end all the servers
> > should be used evenly.
> >
> > If you have about 30% or your range which is not used, just make sure
> that
> > this range is bigger so at the end it will have the same load at the
> > others.
> >
> > JM
> >
> >
> > 2014-08-18 2:08 GMT-04:00 Jianshi Huang <jianshi.huang@gmail.com>:
> >
> > > Hi JM,
> > >
> > > If the region boundaries will not change, does that mean,
> > >
> > > If my data access pattern has skews (say a certain part (30%) of my
> data
> > > will almost never be used), then a proportion (30%) of my server will
> > > always be idle?
> > >
> > > A region server has to have a continuous rowkey range?
> > >
> > > Jianshi
> > >
> > >
> > >
> > >
> > > On Sat, Aug 16, 2014 at 2:46 AM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > H Jianshi,
> > > >
> > > > Not sure to get your question.
> > > >
> > > > Can I rephrase it?
> > > >
> > > > So you have 10 regions, and each of those regions has 10 HFiles. Then
> > you
> > > > run a major compaction on the table. Correct?
> > > >
> > > > Then you will end up with:
> > > >
> > > > reg1:[files:1]
> > > > reg2:[files:2]
> > > > reg3:[files:3]
> > > > ...
> > > >
> > > > Regions boundaries will not change. But each region will not have a
> > > single
> > > > underlaying file.
> > > >
> > > > HTH,
> > > >
> > > > JM
> > > >
> > > >
> > > > 2014-08-15 1:53 GMT-04:00 Jianshi Huang <jianshi.huang@gmail.com>:
> > > >
> > > > > Say I have 100 split files on 10 region servers, and I did a major
> > > > compact.
> > > > >
> > > > > Will these split files be distributed like this:
> > > > > reg1: [splits 1,2,..,10]
> > > > > reg2: [splits 11,12,...,20]
> > > > > ...
> > > > >
> > > > > Or like this:
> > > > > reg1: [splits: 1, 11, 21, ... , 91]
> > > > > reg2: [splits: 2, 12, 22, ... , 92]
> > > > > ...
> > > > >
> > > > > And if I want to specify the locality and the stride of split
> files?
> > > How
> > > > > can I do it in HBase?
> > > > >
> > > > >
> > > > > --
> > > > > Jianshi Huang
> > > > >
> > > > > LinkedIn: jianshi
> > > > > Twitter: @jshuang
> > > > > Github & Blog: http://huangjs.github.com/
>
>
>
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jianshi Huang
> > >
> > > LinkedIn: jianshi
> > > Twitter: @jshuang
> > > Github & Blog: http://huangjs.github.com/
> > >
> >
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message