hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?
Date Thu, 24 Jul 2014 21:10:07 GMT
kind of found this
http://hortonworks.com/blog/hbase-via-hive-part-1/


"
>From a performance perspective, there are things Hive can do today (ie,
not dependent on data types) to take advantage of HBase. There’s also
the possibility of an HBase-aware Hive to make use of HBase tables as
intermediate storage location (HIVE-3565
<https://issues.apache.org/jira/browse/HIVE-3565>), facilitating map-side
joins against dimension tables loaded into HBase. Hive could make use of
HBase’s natural indexed structure (HIVE-3634
<https://issues.apache.org/jira/browse/HIVE-3634>, HIVE-3727
<https://issues.apache.org/jira/browse/HIVE-3727>), potentially saving huge
scans. Currently, the user doesn’t have (any?) control over the scans which
are executed. Configuration on a per-job, or at least per-table basis
should be enabled (HIVE-1233
<https://issues.apache.org/jira/browse/HIVE-1233>). That would enable
an HBase-savy user to provide Hive with hints regarding how it should
interact with HBase. Support for simple split sampling of HBase tables (
HIVE-3399 <https://issues.apache.org/jira/browse/HIVE-3399>) could also be
easily done because HBase manages table partitions already.


On Thu, Jul 24, 2014 at 2:03 PM, Yang <teddyyyy123@gmail.com> wrote:

> if I do a join of a table based on txt file and a table based on HBase,
> and say the latter is very large, is HIVE smart enough to utilize the HBase
> table's index to do the join, instead of implementing this as a regular map
> reduce job, where each table is scanned fully, bucketed on join keys, and
> then the matching items found out through the reducer?
>
>
> thanks
> Yang
>

Mime
View raw message