hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <>
Subject Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?
Date Thu, 24 Jul 2014 21:10:07 GMT
kind of found this

>From a performance perspective, there are things Hive can do today (ie,
not dependent on data types) to take advantage of HBase. There’s also
the possibility of an HBase-aware Hive to make use of HBase tables as
intermediate storage location (HIVE-3565
<>), facilitating map-side
joins against dimension tables loaded into HBase. Hive could make use of
HBase’s natural indexed structure (HIVE-3634
<>, HIVE-3727
<>), potentially saving huge
scans. Currently, the user doesn’t have (any?) control over the scans which
are executed. Configuration on a per-job, or at least per-table basis
should be enabled (HIVE-1233
<>). That would enable
an HBase-savy user to provide Hive with hints regarding how it should
interact with HBase. Support for simple split sampling of HBase tables (
HIVE-3399 <>) could also be
easily done because HBase manages table partitions already.

On Thu, Jul 24, 2014 at 2:03 PM, Yang <> wrote:

> if I do a join of a table based on txt file and a table based on HBase,
> and say the latter is very large, is HIVE smart enough to utilize the HBase
> table's index to do the join, instead of implementing this as a regular map
> reduce job, where each table is scanned fully, bucketed on join keys, and
> then the matching items found out through the reducer?
> thanks
> Yang

View raw message