Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Thu, 30 Apr 2015 00:05:05 +0000 (UTC)
From: "Andrew Mains (JIRA)" <jira@apache.org>
To: dev@hive.apache.org
Message-ID: <JIRA.12826163.1430352288000.29181.1430352305989@Atlassian.JIRA>
In-Reply-To: <JIRA.12826163.1430352288000@Atlassian.JIRA>
References: <JIRA.12826163.1430352288000@Atlassian.JIRA>
 <JIRA.12826163.1430352288021@arcas>
Subject: [jira] [Created] (HIVE-10545) Implement predicate pushdown for
 queries over HBase snapshots
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Andrew Mains created HIVE-10545:
-----------------------------------

             Summary: Implement predicate pushdown for queries over HBase s=
napshots
                 Key: HIVE-10545
                 URL: https://issues.apache.org/jira/browse/HIVE-10545
             Project: Hive
          Issue Type: Improvement
          Components: HBase Handler
            Reporter: Andrew Mains


Hive's hbase integration currently supports queries over HBase snapshots, a=
nd predicate pushdown for queries over HBase tables, but doesn't currently =
support predicate pushdown for queries over HBase snapshots. This seems to =
be largely due to the fact that the hbase handler uses the `mapred` TableSn=
apshotInputFormat implementation, which doesn't support pushing a scan to t=
he job, and not the `mapreduce` implementation, which does (see https://hba=
se.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.htm=
l#initTableSnapshotMapJob(java.lang.String,%20java.lang.String,%20java.lang=
.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop.mapred.Jo=
bConf,%20boolean,%20org.apache.hadoop.fs.Path vs https://hbase.apache.org/a=
pidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html#initTableS=
napshotMapperJob(java.lang.String,%20org.apache.hadoop.hbase.client.Scan,%2=
0java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop=
.mapreduce.Job,%20boolean,%20org.apache.hadoop.fs.Path)) .

Hive should be able to switch to the mapreduce implementation (performing t=
he necessary shimming between mapred and mapreduce), and thus gain the abil=
ity to push predicates down to the input format in the same way as is done =
with HiveTableInputFormat. This switch should result in significant perform=
ance improvements for queries which specify range/equality conditions on th=
e row key (which seems like it would be a reasonably common case).=20


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)