Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of praveensripati@gmail.com
 designates 209.85.215.41 as permitted sender)
Received-SPF: pass (google.com: domain of praveensripati@gmail.com designates
 10.112.8.41 as permitted sender) client-ip=10.112.8.41;
MIME-Version: 1.0
In-Reply-To: 
 <CADcMMgFeKk7KtgsFirZWvukZvdA9mwcejf+HynnuzncV08saDQ@mail.gmail.com>
References: 
 <CADYHM8zjZxuN_5bSWbFbTrkU9bepBGpJQWi1xWP0-qFtkORKkQ@mail.gmail.com>
	<CAFukC=5T8SwCYH+5Ur=xV2cr4PMOwL+dDg3Td0-AkZ4-hzHUEw@mail.gmail.com>
	<CAPpvZ9AdTk-KqvzqF5qDLcZ0iEXa=N0HF9Be9dqXVZU4tF+yow@mail.gmail.com>
	<CADYHM8wy5vet1kYRyv32DyxcGVzDp8aJQSpc-_5y4g040xLHhg@mail.gmail.com>
	<CADcMMgFeKk7KtgsFirZWvukZvdA9mwcejf+HynnuzncV08saDQ@mail.gmail.com>
Date: Tue, 21 Feb 2012 15:02:19 +0530
Message-ID: 
 <CADYHM8xEV5pL9j+N6ws4VKBkUFNOoPOgueaxMwH=7jGOFdgjBg@mail.gmail.com>
Subject: Re: HBase and Data Locality
From: Praveen Sripati <praveensripati@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=90e6ba10ad815e380904b9761652

--90e6ba10ad815e380904b9761652
Content-Type: text/plain; charset=ISO-8859-1

Stack,

> Its recommended that you run major compactions yourself at down times.

Can we change the `hbase.hregion.majorcompaction` value from 86400000 to -1
along with the required code changes and make a note of it in the
hbase-default.xml? Also, the hbase.master.loadbalancer.class is not
specified in the hbase-default.xml. Should I open a JIRA and make those two
changes?

> In 0.92 there is hits per region and this gets reported to the master as
part of ClusterStatus as does memory usage.  This could be factored into a
new balance algorithm.

I just saw the code for the ClusterStatus, HServerLoad and RegionLoad.
First cut I was thinking if we can use only the resource usage (memory,
cpu, # of hits and not block location) to sort the region server in
decreasing order of resource usage and then shed the regions from the top
till the region server with an average usage is hit and then assign the
regions which have been shed to the region server list from the bottom.
This is more or less similar to the DefaultLoadBalancer, but taking the
resource usage into considerations and not on the # of regions.

Question is should memory, cpu and # of hits be given equal weightage or
make it configurable?

> On locality, the fb lads are working on a primitive that makes it so the
hbase dfsclient will tell hdfs where to place blocks.

Is there any JIRA for it?

So using this functionality HBase can specify the location of the 2nd and
the 3rd replica (1st is local) and later use the location of the 2nd and
3rd replica during a region server crash or a region movement. What happens
if the HDFS load balancer is run, the blocks are moved again?

> This primitive that the lads are working on needs to be done I believe
before hbase-57 can be done (properly).

Again the question is should we mix the memory, cpu, # of hits with the
block location and give them configurable weightage or keep the data
locality out for the first cut, since the major compaction (although manual
as per recommendation) will pull all the blocks together anyway.

Thanks,
Praveen

On Tue, Feb 21, 2012 at 2:39 AM, Stack <stack@duboce.net> wrote:

> On Mon, Feb 20, 2012 at 5:03 AM, Praveen Sripati
> <praveensripati@gmail.com> wrote:
> > It would be nice to consider
> > both the resource usage of the region and the data locality into
> > consideration, not just purely based on the number of regions in the
> region
> > server as implemented currently.
> >
>
> Yes.
>
> > The file to block mapping can be found from the HDFS NameNode, but how to
> > find out which regions are loaded (# of requests, cpu and memory
> > perspective) and which are not? I could not see any resource utilization
> in
> > the region server pages.
> >
>
> In 0.92 there is hits per region and this gets reported to the master
> as part of ClusterStatus as does memory usage.  This could be factored
> into a new balance algorithm.  Could also send over cpu and hardware
> profile for factoring (though much of this is available via JMX --
> either we get these into clusterstatus or master does poll on jmx
> after it sees new server to get server profile)
>
> > Also, curious if HBASE-57 makes sense, since the major compaction runs
> > every 24 hrs
>
> Its recommended that you run major compactions yourself at down times.
>
>  I think that the balancer has to be run manually in HDFS and
> > there will be a maximum of 24 hrs window between a HDFS balancer
> execution
> > and a major compaction during which data locality might be lost.
> >
>
> Yes the hdfs balancer needs to be run manually and yes it knows
> nothing of how hbase has ordered the blocks and will not respect
> region locality when it goes about its business.
>
> I'm sure though I follow the rest of what you are saying above.
>
> On locality, the fb lads are working on a primitive that makes it so
> the hbase dfsclient will tell hdfs where to place blocks.  The favored
> replica locations will be kept up in .META. in a new column When a
> regionserver crashes, or if we want to move a region, we'll move it or
> reopen it on one of the locations that has had region blocks
> replicated to it.  This should help improve the locality story on
> failover/move.
>
> Without this functionality, we're left with the current behavior where
> blocks for regions are scattered and its only per chance you'd have
> good locality opening a region in any location other than the current
> deploy where the gentle waves of compaction having been nudging data
> local.
>
> I dont believe there is an issue for the above yet.  Let me chase the
> lads to file one.
>
> This primitive that the lads are working on needs to be done I believe
> before hbase-57 can be done (properly).  What you reckon Praveen?
>
> St.Ack
>

--90e6ba10ad815e380904b9761652--