hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <>
Subject Re: Is Hive Index officially not recommended?
Date Wed, 06 Jan 2016 01:43:54 GMT
I'd like to revise the Indexing
<> and
IndexDev <> docs
in the wiki to include this information (as well as information from a
previous thread, if I can find it) so people won't be misled into using
indexes inappropriately.

But it might be more efficient for Gopal or another expert to do the
revisions.  Otherwise I would need careful reviews to make sure I don't
garble things.

-- Lefty

On Tue, Jan 5, 2016 at 3:55 PM, Gopal Vijayaraghavan <>

> >So in a nutshell in Hive if "external" indexes are not used for improving
> >query response, what value they add and can we forget them for now?
> The builtin indexes - those that write data as smaller tables are only
> useful in a pre-columnar world, where the indexes offer a huge reduction
> in IO.
> Part #1 of using hive indexes effectively is to write your own
> HiveIndexHandler, with usesIndexTable=false;
> And then write a IndexPredicateAnalyzer, which lets you map arbitrary
> lookups into other range conditions.
> Not coincidentally - we're adding a "ANALYZE TABLE ... CACHE METADATA"
> which consolidates the "internal" index into an external store (HBase).
> Some of the index data now lives in the HBase metastore, so that the
> inclusion/exclusion of whole partitions can be done off the consolidated
> index.
> The experience from BI workloads run by customers is that in general, the
> lookup to the right "slice" of data is more of a problem than the actual
> aggregate.
> And that for a workhorse data warehouse, this has to survive even if
> there's a non-stop stream of updates into it.
> Cheers,
> Gopal

View raw message