hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject Why my MR job running on HBase all are Rack-local map tasks
Date Mon, 24 Feb 2014 18:59:45 GMT
I have a 10 nodes cluster with 8 of them are datanode/tasknode/HbaseRegionNode.
I have a HBase table with one column family and 1.5T data, spread across 55 regions on these
8 region servers. When I run a testing scan MR job, it will generate 55 mapper tasks, (Matching
with 55 regions), but all of them are rack-local map tasks (Not a single data-local map tasks).
The cluster is being running for weeks. I did a major compact before the MR job. I run the
MR job for several times, and all I got are 55 rack-local map tasks, not a single data local
map tasks. I think something is wrong with my cluster/hbase setting, but not sure why.
All 8 child boxes are running datanode, tasknode and hbase region servers. All 10 boxes are
in one rack.
Here is what I observed some difference:
In the MR job running a Hbase table, here is one example:
Task AttemptsMachineStatusProgressStart TimeFinish TimeErrorsTask LogsCountersActionsattempt_201402131137_0469_m_000000_0/default-rack/10.xx.xx.xxSUCCEEDED100.00%24-Feb-2014
09:58:2324-Feb-2014 10:31:41 (33mins, 18sec)Last 4KB
Last 8KB
13 Input Split Locations/default-rack/real_hostname.

As you can see, in the input split, it shows the real HOSTNAME of of the box, and in the Task
attempts, the machine information is the real IP of the machine running the task, which is
NOT the same as the InputSplit Location.
On the other hand, if I running a MR job of the HDFS files in this cluster, I will get 30
of 32 mappers are data local tasks. Here is the output:
All Task AttemptsTask AttemptsMachineStatusProgressStart TimeFinish TimeErrorsTask LogsCountersActionsattempt_201402131137_0467_m_000000_0/default-rack/10.xx.xx.133SUCCEEDED100.00%24-Feb-2014
09:49:5824-Feb-2014 09:50:29 (30sec)Last 4KB
Last 8KB
20 Input Split Locations/default-rack/10.xx.xx.133/default-rack/10.xx.xx.135/default-rack/10.xx.xx.140

What difference I saw here is that the InputSplit Location in MR job on HDFS file are shown
as real IP address, instead of host name as in Hbase. Could it be the reason I got 0 data
local map tasks in Hbase MR job? If not, what could be?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message