hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1672) Map tasks not local to RS
Date Tue, 21 Jul 2009 14:57:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733651#action_12733651

Jean-Daniel Cryans commented on HBASE-1672:

We already do this inside TableInputFormatBase:

String regionLocation = table.getRegionLocation(startKeys[startPos]).
splits[i] = new TableSplit(this.table.getTableName(),
  startKeys[startPos], ((i + 1) < realNumSplits) ? startKeys[lastPos]:
  HConstants.EMPTY_START_ROW, regionLocation);
LOG.info("split: " + i + "->" + splits[i]);

I don't know if we can do anything more than that. One difference in HBase compared to mapred
on HDFS is that a region is only on one node, not 3 which is the default replication factor.
So being able to get the right map task on the right RS at the right moment may be difficult
for the JobTracker.

> Map tasks not local to RS
> -------------------------
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.20.0, 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>             Fix For: 0.20.0, 0.19.4
> The number of data local map tasks while scanning a table is only about 10% of the total
map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the scan job were
equal to the number of regions (280). Only 25 of them were data local tasks.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message