hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat
Date Fri, 08 Dec 2017 04:28:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283035#comment-16283035
] 

Xiang Li edited comment on HBASE-15482 at 12/8/17 4:27 AM:
-----------------------------------------------------------

[~tedyu], thanks very much for your comments!
patch 001 is uploaded to address your comments as well as the errors reported by checkstyle.
* "hbase.TableSnapshotInputFormat.locality" is changed into "hbase.TableSnapshotInputFormat.locality.enable".
* The truncation of locations is moved into getBestLocations().
* The errors reported by checkstyle are corrected.

Regarding {{moving the truncation of locations into getBestLocations()}}:
The code has different logic for different combinations of hostAndWeights.length and numTopsAtMost.
And there is a small behavior change on getBestLocations() when hostAndWeights.length is 0:
* Originally, it returns an empty list.
* After the change, it returns null. I think we do not need to allocate an empty list here,
as the locations will be used to construct TableSnapshotInputFormatImpl.InputSplit and null
will be checked as follow
{code:title=hbase/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java|borderStyle=solid}
public InputSplit(TableDescriptor htd, HRegionInfo regionInfo, List<String> locations,
        Scan scan, Path restoreDir) {
      this.htd = htd;
      this.regionInfo = regionInfo;
      if (locations == null || locations.isEmpty()) { // <--- here
        this.locations = new String[0];
      } else {
        this.locations = locations.toArray(new String[locations.size()]);
      }
      try {
        this.scan = scan != null ? TableMapReduceUtil.convertScanToString(scan) : "";
      } catch (IOException e) {
        LOG.warn("Failed to convert Scan to String", e);
      }

      this.restoreDir = restoreDir.toString();
    }
{code}
And TableSnapshotInputFormatImpl is @InterfaceAudience.Private, there is no other calls of
getBestLocations() in the whole HBase project except UTs. A UT is updated according to the
change above.


was (Author: water):
[~tedyu], thanks very much for your comments!
patch 001 is updated to address your comments as well as the errors reported by checkstyle.
* "hbase.TableSnapshotInputFormat.locality" is changed into "hbase.TableSnapshotInputFormat.locality.enable".
* The truncation of locations is moved into getBestLocations().
* The errors reported by checkstyle are corrected.

Regarding {{moving the truncation of locations into getBestLocations()}}:
The code has different logic for different combinations of hostAndWeights.length and numTopsAtMost.
And there is a small behavior change on getBestLocations() when hostAndWeights.length is 0:
* Originally, it returns an empty list.
* After the change, it returns null. I think we do not need to allocate an empty list here,
as the locations will be used to construct TableSnapshotInputFormatImpl.InputSplit and null
will be checked as follow
{code:title=hbase/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java|borderStyle=solid}
public InputSplit(TableDescriptor htd, HRegionInfo regionInfo, List<String> locations,
        Scan scan, Path restoreDir) {
      this.htd = htd;
      this.regionInfo = regionInfo;
      if (locations == null || locations.isEmpty()) { // <--- here
        this.locations = new String[0];
      } else {
        this.locations = locations.toArray(new String[locations.size()]);
      }
      try {
        this.scan = scan != null ? TableMapReduceUtil.convertScanToString(scan) : "";
      } catch (IOException e) {
        LOG.warn("Failed to convert Scan to String", e);
      }

      this.restoreDir = restoreDir.toString();
    }
{code}
And TableSnapshotInputFormatImpl is @InterfaceAudience.Private, there is no other calls of
getBestLocations() in the whole HBase project except UTs. A UT is updated according to the
change above.

> Provide an option to skip calculating block locations for SnapshotInputFormat
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15482
>                 URL: https://issues.apache.org/jira/browse/HBASE-15482
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Liyin Tang
>            Assignee: Xiang Li
>            Priority: Minor
>             Fix For: 2.1.0
>
>         Attachments: HBASE-15482.master.000.patch, HBASE-15482.master.001.patch
>
>
> When a MR job is reading from SnapshotInputFormat, it needs to calculate the splits based
on the block locations in order to get best locality. However, this process may take a long
time for large snapshots. 
> In some setup, the computing layer, Spark, Hive or Presto could run out side of HBase
cluster. In these scenarios, the block locality doesn't matter. Therefore, it will be great
to have an option to skip calculating the block locations for every job. That will super useful
for the Hive/Presto/Spark connectors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message