hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Mogenet <adrien.moge...@gmail.com>
Subject Re: Poor data locality of MR job
Date Thu, 02 Aug 2012 06:39:35 GMT
Did you pre split your table or did you let balancer assign regions to
regionservers for you ?

Did your regionserver(s) fail ?

On Thu, Aug 2, 2012 at 8:31 AM, Bryan Keller <bryanck@gmail.com> wrote:

> I have an 8 node cluster and a table that is pretty well balanced with on
> average 36 regions/node. When I run a mapreduce job on the cluster against
> this table, the data locality of the mappers is poor, e.g 100 rack local
> mappers and only 188 data local mappers. I would expect nearly all of the
> mappers to be data local. DNS appears to be fine, i.e. the hostname in the
> splits is the same as the hostnames in the task attempts.
> The performance of the rack local mappers is poor and causes overall scan
> performance to suffer.
> The table isn't new and from what I understand, HDFS replication will
> eventually keep region data blocks local to the regionserver. Are there
> other reasons for data locality to be poor and any way to fix it?

Adrien Mogenet

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message