hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat
Date Tue, 19 Dec 2017 16:56:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297060#comment-16297060
] 

Xiang Li edited comment on HBASE-15482 at 12/19/17 4:55 PM:
------------------------------------------------------------

Hi [~tedyu], [~jerryhe], thanks for your comments and guide! 
Patch 003 is uploaded to address the following changes mainly:
* Simple the logic in light of 15482.v3.txt. Besides, add the logic to
** Check if numTopsAtMost < 1 (which is invalid)
** Check if top is 1. When it is 1, return top host directly.
* Adjust testGetBestLocations()
* Change the conf key string from {{hbase.TableSnapshotInputFormat.locality.enable}} into
{{hbase.TableSnapshotInputFormat.locality.enabled}}, by using "enabled" instead of "enable",
as I see most of the conf key strings are using "enabled"


was (Author: water):
Hi [~tedyu], [~jerryhe], thanks for your comments and guide! 
Patch 003 is uploaded to address the following changes mainly:
* Simple the logic in light of 15482.v3.txt. Besides, add the logic to
** Check if numTopsAtMost < 1 (which is invalid)
** Check if top is 1. When it is 1, return top host directly.
* Change the conf key string from {{hbase.TableSnapshotInputFormat.locality.enable}} into
{{hbase.TableSnapshotInputFormat.locality.enabled}}, by using "enabled" instead of "enable",
as I see most of the conf key strings are using "enabled"

> Provide an option to skip calculating block locations for SnapshotInputFormat
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15482
>                 URL: https://issues.apache.org/jira/browse/HBASE-15482
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Liyin Tang
>            Assignee: Xiang Li
>            Priority: Minor
>             Fix For: 2.1.0
>
>         Attachments: 15482.v3.txt, HBASE-15482.master.000.patch, HBASE-15482.master.001.patch,
HBASE-15482.master.002.patch, HBASE-15482.master.003.patch
>
>
> When a MR job is reading from SnapshotInputFormat, it needs to calculate the splits based
on the block locations in order to get best locality. However, this process may take a long
time for large snapshots. 
> In some setup, the computing layer, Spark, Hive or Presto could run out side of HBase
cluster. In these scenarios, the block locality doesn't matter. Therefore, it will be great
to have an option to skip calculating the block locations for every job. That will super useful
for the Hive/Presto/Spark connectors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message