hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen Sripati <praveensrip...@gmail.com>
Subject Re: HBase and Data Locality
Date Mon, 20 Feb 2012 13:03:01 GMT
Looking at the DefaultLoadBalancer.balance(), the balancing is purely based
on the number of regions hosted per region server and not on the resource
usage. HBASE-57 suggests to use the data locality into consideration when
the regions are assigned to the region server. It would be nice to consider
both the resource usage of the region and the data locality into
consideration, not just purely based on the number of regions in the region
server as implemented currently.

The file to block mapping can be found from the HDFS NameNode, but how to
find out which regions are loaded (# of requests, cpu and memory
perspective) and which are not? I could not see any resource utilization in
the region server pages.

Also, curious if HBASE-57 makes sense, since the major compaction runs
every 24 hrs and the HFiles are all local to the regions after major
compaction. I think that the balancer has to be run manually in HDFS and
there will be a maximum of 24 hrs window between a HDFS balancer execution
and a major compaction during which data locality might be lost.

I am interested in working on this JIRA, but need some help from the HBase
community.

Regards,
Praveen

On Tue, Feb 14, 2012 at 7:34 PM, Mikael Sitruk <mikael.sitruk@gmail.com>wrote:

> Region allocation is kept in the next restart (
> https://issues.apache.org/jira/browse/HBASE-2896 ). This is also present
> in
> the CDH3 code.
> Nevertheless if you have a server that did not start correctly you will
> have region that will move from it and locality will not remain (even after
> you start the problematic node, since he will get random regions)
> The best solution would be effectivly
> https://issues.apache.org/jira/browse/HBASE-57
>
>
> Mikael.S
>
> On Tue, Feb 14, 2012 at 3:19 PM, Brock Noland <brock@cloudera.com> wrote:
>
> > Hi,
> >
> > On Tue, Feb 14, 2012 at 7:13 AM, Praveen Sripati
> > <praveensripati@gmail.com> wrote:
> > > Lars blog (1) mentions that data locality for the region servers is
> lost
> > > when HBase cluster is restarted. It's also mentioned at the end that
> work
> > > is going in HBase to assign regions to RS taking data locality into
> > > consideration. The blog entry is 18 months old and so I would like to
> > know
> > > if this has been incorporated into the latest HBase release or data
> > > locality is lost till a compaction is complete.
> >
> > JIRA is down for me, but here is the JIRA:
> >
> > https://issues.apache.org/jira/browse/HBASE-2896
> >
> > I am pretty sure it's been included in the latest HBase release as it's
> in
> > CDH3.
> >
> > Brock
> >
> > --
> > Apache MRUnit - Unit testing MapReduce -
> > http://incubator.apache.org/mrunit/
> >
>
>
>
> --
> Mikael.S
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message