Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C23F719EF6 for ; Fri, 18 Mar 2016 04:19:34 +0000 (UTC) Received: (qmail 84564 invoked by uid 500); 18 Mar 2016 04:19:33 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 84249 invoked by uid 500); 18 Mar 2016 04:19:33 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 84226 invoked by uid 99); 18 Mar 2016 04:19:33 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2016 04:19:33 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 79DB72C1F54 for ; Fri, 18 Mar 2016 04:19:33 +0000 (UTC) Date: Fri, 18 Mar 2016 04:19:33 +0000 (UTC) From: "Liyin Tang (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Liyin Tang created HBASE-15482: ---------------------------------- Summary: Provide an option to skip calculating block locations for SnapshotInputFormat Key: HBASE-15482 URL: https://issues.apache.org/jira/browse/HBASE-15482 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Liyin Tang Priority: Minor When a MR job is reading from SnapshotInputFormat, it needs to calculate the splits based on the block locations in order to get best locality. However, this process may take a long time for large snapshots. In some setup, the computing layer, Spark, Hive or Presto could run out side of HBase cluster. In these scenarios, the block locality doesn't matter. Therefore, it will be great to have an option to skip calculating the block locations for every job. That will super useful for the Hive/Presto/Spark connectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)