hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9768) Hive LLAP Metadata pre-load for low latency, + cluster-wide metadata refresh/invalidate command
Date Wed, 25 Feb 2015 20:00:08 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337079#comment-14337079
] 

Alan Gates commented on HIVE-9768:
----------------------------------

I'll leave the questions of whether LLAP should or shouldn't cache metadata to the people
building LLAP, though I think it only needs to cache the stats and security info not catalog
data.  Stats at least has a lower freshness requirement.

As for allowing external entities to cache metadata and find out when it's invalid, I agree
there are uses for that.  The metastore already has a listener interface where it can fire
events anytime a DDL operation happens.  It seems you could hook into this and build a cache
notifier system that allows caching entities to register themselves.  Then with a listener
that informed that cache notifier every time there was a DDL event the cache notifier could
then send out notices to the relevant caching entities.

> Hive LLAP Metadata pre-load for low latency, + cluster-wide metadata refresh/invalidate
command
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9768
>                 URL: https://issues.apache.org/jira/browse/HIVE-9768
>             Project: Hive
>          Issue Type: New Feature
>          Components: HCatalog, Metastore, Query Planning, Query Processor
>    Affects Versions: llap
>         Environment: HDP 2.2
>            Reporter: Hari Sekhon
>
> Feature request for Hive LLAP to preload table metadata across all running nodes to reduce
query latency (this is what Impala does).
> The design decision behind this in Impala was to avoid the latency overhead of fetching
the metadata at query time, since that's an extra database query (or possibly HBase query
in future HIVE-9452) that must first be completely fullfilled before the Hive LLAP query even
starts to run, which would slow down the response to the user if not pre-loaded. Also, any
temporary outage of the metadata layer would affect the speed LLAP layer so pre-loading and
caching the metadata adds resilience against this.
> This pre-loaded metadata also requires a cluster-wide "refresh metadata" operation, something
Impala added later, and now calls "INVALIDATE METADATA" in it's SQL dialect. I propose using
a more intuitive "REFRESH METADATA" Hive command instead.
> (Fyi I was in the first trio of Impala SMEs at Cloudera in early 2013)
> Regards,
> Hari Sekhon
> ex-Cloudera
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message