hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Ruchovets <oruchov...@gmail.com>
Subject Re: bulk loading regions number
Date Mon, 10 Sep 2012 08:45:49 GMT
Great
  That is actually what I am thinking about too.
What is the best practice to choose HFile size?
What is the penalty to define it very big?

Thanks
Oleg.

On Mon, Sep 10, 2012 at 4:24 AM, Harsh J <harsh@cloudera.com> wrote:

> Hi Oleg,
>
> If the root issue is a growing number of regions, why not control that
> instead of a way to control the Reducer count? You could, for example,
> raise the split-point sizes for HFiles, to not have it split too much,
> and hence have larger but fewer regions?
>
> Given that you have 10 machines, I'd go this way rather than ending up
> with a lot of regions causing issues with load.
>
> On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets <oruchovets@gmail.com>
> wrote:
> > Hi ,
> >   I am using bulk loading to write my data to hbase.
> >
> > I works fine , but number of regions growing very rapidly.
> > Entering ONE WEEK of data I got  200 regions (I am going to save years of
> > data).
> > As a result job which writes data to HBase has REDUCERS number equals
> > REGIONS number.
> > So entering only one WEEK of data I have 200 reducers.
> >
> > Questions:
> >    How to resolve the problem of constantly growing reducers number using
> > bulk loading and TotalOrderPartition.
> >  I have 10 machine cluster and I think I should have ~ 30 reducers.
> >
> > Thank in advance.
> > Oleg.
>
>
>
> --
> Harsh J
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message