Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A108017FBE for ; Thu, 30 Apr 2015 00:05:06 +0000 (UTC) Received: (qmail 84163 invoked by uid 500); 30 Apr 2015 00:05:06 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 84066 invoked by uid 500); 30 Apr 2015 00:05:06 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 83970 invoked by uid 99); 30 Apr 2015 00:05:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Apr 2015 00:05:06 +0000 Date: Thu, 30 Apr 2015 00:05:05 +0000 (UTC) From: "Andrew Mains (JIRA)" To: dev@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-10545) Implement predicate pushdown for queries over HBase snapshots MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Andrew Mains created HIVE-10545: ----------------------------------- Summary: Implement predicate pushdown for queries over HBase s= napshots Key: HIVE-10545 URL: https://issues.apache.org/jira/browse/HIVE-10545 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Andrew Mains Hive's hbase integration currently supports queries over HBase snapshots, a= nd predicate pushdown for queries over HBase tables, but doesn't currently = support predicate pushdown for queries over HBase snapshots. This seems to = be largely due to the fact that the hbase handler uses the `mapred` TableSn= apshotInputFormat implementation, which doesn't support pushing a scan to t= he job, and not the `mapreduce` implementation, which does (see https://hba= se.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.htm= l#initTableSnapshotMapJob(java.lang.String,%20java.lang.String,%20java.lang= .Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop.mapred.Jo= bConf,%20boolean,%20org.apache.hadoop.fs.Path vs https://hbase.apache.org/a= pidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html#initTableS= napshotMapperJob(java.lang.String,%20org.apache.hadoop.hbase.client.Scan,%2= 0java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop= .mapreduce.Job,%20boolean,%20org.apache.hadoop.fs.Path)) . Hive should be able to switch to the mapreduce implementation (performing t= he necessary shimming between mapred and mapreduce), and thus gain the abil= ity to push predicates down to the input format in the same way as is done = with HiveTableInputFormat. This switch should result in significant perform= ance improvements for queries which specify range/equality conditions on th= e row key (which seems like it would be a reasonably common case).=20 -- This message was sent by Atlassian JIRA (v6.3.4#6332)