hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Mains (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-10545) Implement predicate pushdown for queries over HBase snapshots
Date Thu, 30 Apr 2015 00:05:05 GMT
Andrew Mains created HIVE-10545:
-----------------------------------

             Summary: Implement predicate pushdown for queries over HBase snapshots
                 Key: HIVE-10545
                 URL: https://issues.apache.org/jira/browse/HIVE-10545
             Project: Hive
          Issue Type: Improvement
          Components: HBase Handler
            Reporter: Andrew Mains


Hive's hbase integration currently supports queries over HBase snapshots, and predicate pushdown
for queries over HBase tables, but doesn't currently support predicate pushdown for queries
over HBase snapshots. This seems to be largely due to the fact that the hbase handler uses
the `mapred` TableSnapshotInputFormat implementation, which doesn't support pushing a scan
to the job, and not the `mapreduce` implementation, which does (see https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.html#initTableSnapshotMapJob(java.lang.String,%20java.lang.String,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop.mapred.JobConf,%20boolean,%20org.apache.hadoop.fs.Path
vs https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html#initTableSnapshotMapperJob(java.lang.String,%20org.apache.hadoop.hbase.client.Scan,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop.mapreduce.Job,%20boolean,%20org.apache.hadoop.fs.Path))
.

Hive should be able to switch to the mapreduce implementation (performing the necessary shimming
between mapred and mapreduce), and thus gain the ability to push predicates down to the input
format in the same way as is done with HiveTableInputFormat. This switch should result in
significant performance improvements for queries which specify range/equality conditions on
the row key (which seems like it would be a reasonably common case). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message