hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase and Data Locality
Date Mon, 20 Feb 2012 21:09:16 GMT
On Mon, Feb 20, 2012 at 5:03 AM, Praveen Sripati
<praveensripati@gmail.com> wrote:
> It would be nice to consider
> both the resource usage of the region and the data locality into
> consideration, not just purely based on the number of regions in the region
> server as implemented currently.
>

Yes.

> The file to block mapping can be found from the HDFS NameNode, but how to
> find out which regions are loaded (# of requests, cpu and memory
> perspective) and which are not? I could not see any resource utilization in
> the region server pages.
>

In 0.92 there is hits per region and this gets reported to the master
as part of ClusterStatus as does memory usage.  This could be factored
into a new balance algorithm.  Could also send over cpu and hardware
profile for factoring (though much of this is available via JMX --
either we get these into clusterstatus or master does poll on jmx
after it sees new server to get server profile)

> Also, curious if HBASE-57 makes sense, since the major compaction runs
> every 24 hrs

Its recommended that you run major compactions yourself at down times.

 I think that the balancer has to be run manually in HDFS and
> there will be a maximum of 24 hrs window between a HDFS balancer execution
> and a major compaction during which data locality might be lost.
>

Yes the hdfs balancer needs to be run manually and yes it knows
nothing of how hbase has ordered the blocks and will not respect
region locality when it goes about its business.

I'm sure though I follow the rest of what you are saying above.

On locality, the fb lads are working on a primitive that makes it so
the hbase dfsclient will tell hdfs where to place blocks.  The favored
replica locations will be kept up in .META. in a new column When a
regionserver crashes, or if we want to move a region, we'll move it or
reopen it on one of the locations that has had region blocks
replicated to it.  This should help improve the locality story on
failover/move.

Without this functionality, we're left with the current behavior where
blocks for regions are scattered and its only per chance you'd have
good locality opening a region in any location other than the current
deploy where the gentle waves of compaction having been nudging data
local.

I dont believe there is an issue for the above yet.  Let me chase the
lads to file one.

This primitive that the lads are working on needs to be done I believe
before hbase-57 can be done (properly).  What you reckon Praveen?

St.Ack

Mime
View raw message