Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7D1A6200D67 for ; Sun, 10 Dec 2017 04:15:16 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 723BB160C20; Sun, 10 Dec 2017 03:15:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B7266160C0E for ; Sun, 10 Dec 2017 04:15:15 +0100 (CET) Received: (qmail 14851 invoked by uid 500); 10 Dec 2017 03:15:14 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 14840 invoked by uid 99); 10 Dec 2017 03:15:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Dec 2017 03:15:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1D7E0180640 for ; Sun, 10 Dec 2017 03:15:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 1YWeQcf2mVUr for ; Sun, 10 Dec 2017 03:15:12 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id B6C1C5F47A for ; Sun, 10 Dec 2017 03:15:11 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7C8A5E015F for ; Sun, 10 Dec 2017 03:15:08 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 24A6D212F5 for ; Sun, 10 Dec 2017 03:15:04 +0000 (UTC) Date: Sun, 10 Dec 2017 03:15:00 +0000 (UTC) From: "Jerry He (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 10 Dec 2017 03:15:16 -0000 [ https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285062#comment-16285062 ] Jerry He commented on HBASE-15482: ---------------------------------- The patch looks good! I just think the first patch 000 is cleaner. But, as Ted suggested, change hbase.TableSnapshotInputFormat.locality to hbase.TableSnapshotInputFormat.locality.enable. (Change the name SNAPSHOT_INPUTFORMAT_CARE_BLOCK_LOCALITY_KEY too). The other changes look unnecessary except making it more complicated. {code} if (careBlockLocality) { Assert.assertTrue(split.getLocations() != null && split.getLocations().length != 0); } else { Assert.assertTrue(split.getLocations() != null && split.getLocations().length == 0); } {code} This is ok too. The first test is an existing test, and it has not failed previously. > Provide an option to skip calculating block locations for SnapshotInputFormat > ----------------------------------------------------------------------------- > > Key: HBASE-15482 > URL: https://issues.apache.org/jira/browse/HBASE-15482 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Reporter: Liyin Tang > Assignee: Xiang Li > Priority: Minor > Fix For: 2.1.0 > > Attachments: HBASE-15482.master.000.patch, HBASE-15482.master.001.patch, HBASE-15482.master.002.patch > > > When a MR job is reading from SnapshotInputFormat, it needs to calculate the splits based on the block locations in order to get best locality. However, this process may take a long time for large snapshots. > In some setup, the computing layer, Spark, Hive or Presto could run out side of HBase cluster. In these scenarios, the block locality doesn't matter. Therefore, it will be great to have an option to skip calculating the block locations for every job. That will super useful for the Hive/Presto/Spark connectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)