hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: How are split files distributed across Region servers?
Date Tue, 19 Aug 2014 06:31:42 GMT
I'd change the max file size to 20GB. That'd give you 5000 regions for 100TB.



________________________________
 From: Jianshi Huang <jianshi.huang@gmail.com>
To: user@hbase.apache.org 
Sent: Monday, August 18, 2014 12:22 PM
Subject: Re: How are split files distributed across Region servers?
 

Hi JM,

Make the range bigger you mean to make it multiple regions/splits, right?

I probably will have >100TB of data, and I think the default split file
size is 10GB. So I can assume each of my 100 machines will get assigned to
100 *random* regions?

Where can I find the implementation details or settings for region
assignment?

Jianshi



On Mon, Aug 18, 2014 at 8:48 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Jianshi,
>
> A region server can host more than one region. So if you pre-split your
> table correctly based on your access usage, at the end all the servers
> should be used evenly.
>
> If you have about 30% or your range which is not used, just make sure that
> this range is bigger so at the end it will have the same load at the
> others.
>
> JM
>
>
> 2014-08-18 2:08 GMT-04:00 Jianshi Huang <jianshi.huang@gmail.com>:
>
> > Hi JM,
> >
> > If the region boundaries will not change, does that mean,
> >
> > If my data access pattern has skews (say a certain part (30%) of my data
> > will almost never be used), then a proportion (30%) of my server will
> > always be idle?
> >
> > A region server has to have a continuous rowkey range?
> >
> > Jianshi
> >
> >
> >
> >
> > On Sat, Aug 16, 2014 at 2:46 AM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > H Jianshi,
> > >
> > > Not sure to get your question.
> > >
> > > Can I rephrase it?
> > >
> > > So you have 10 regions, and each of those regions has 10 HFiles. Then
> you
> > > run a major compaction on the table. Correct?
> > >
> > > Then you will end up with:
> > >
> > > reg1:[files:1]
> > > reg2:[files:2]
> > > reg3:[files:3]
> > > ...
> > >
> > > Regions boundaries will not change. But each region will not have a
> > single
> > > underlaying file.
> > >
> > > HTH,
> > >
> > > JM
> > >
> > >
> > > 2014-08-15 1:53 GMT-04:00 Jianshi Huang <jianshi.huang@gmail.com>:
> > >
> > > > Say I have 100 split files on 10 region servers, and I did a major
> > > compact.
> > > >
> > > > Will these split files be distributed like this:
> > > > reg1: [splits 1,2,..,10]
> > > > reg2: [splits 11,12,...,20]
> > > > ...
> > > >
> > > > Or like this:
> > > > reg1: [splits: 1, 11, 21, ... , 91]
> > > > reg2: [splits: 2, 12, 22, ... , 92]
> > > > ...
> > > >
> > > > And if I want to specify the locality and the stride of split files?
> > How
> > > > can I do it in HBase?
> > > >
> > > >
> > > > --
> > > > Jianshi Huang
> > > >
> > > > LinkedIn: jianshi
> > > > Twitter: @jshuang
> > > > Github & Blog: http://huangjs.github.com/



> > > >
> > >
> >
> >
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message