hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott C Gray <>
Subject Re: Caching metastore objects
Date Wed, 27 May 2015 00:59:13 GMT


This may or may not be considered a separate topic, but beyond the caching
that could take place in the metastore itself, there is generally a need
for a general purpose mechanism for notifications about metastore changes.
For example, a number of applications currently maintain their own caches
of metastore objects, such as Big SQL or Impala.   On the Big SQL
development side, we've been thinking about the possibility of a low-level,
pluggable, notification mechanism for changes (I think a partial framework
for this was attempted in hcatalog?), perhaps with the ability to
concurrently chain multiple implementations.  Thus, Hive could install its
own implementation to keep it's local metastore caches in sync but, say,
another implementation could post changes to kafka to broadcast
notifications to any interested party.


From:	Alan Gates <>
Date:	05/26/2015 01:57 PM
Subject:	Re: Caching metastore objects

         Sivaramakrishnan Narayanan
         May 25, 2015 at 20:19
         Apologies if this has been discussed in the past - my searches did
         not pull
         up any relevant threads. If there are better solutions available
         out of the
         box, please let me know!

         Problem statement

         We have a setup where a single metastoredb is used by Hive, Presto
         SparkSQL. In addition, there are 1000s of hive queries submitted
         in batch
         form from multiple machines. Oftentimes, the metastoredb ends up
         remote (in a different region in AWS etc) and round-trip latency
         is high.
         We've seen single thrift calls getting translated into lots of
         small SQL
         calls by datanucleus and the roundtrip latency ends up killing
         Furthermore, any of these systems may create / modify a hive table
         and this
         should be reflected in the other system. Example, I may create a
         table in
         hive and query it using Presto or vice versa. In our setup, there
         may be
         multiple thrift metastore servers pointing to the same metastore


         Basically, we've been looking at caching to solve this problem
         (will come
         to invalidation in a bit). I looked briefly at DN's support for
         caching -
         these two parameters seem to be switched off by default.

         METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false),

         Furthermore, my reading of
         that there is no sophistication in invalidation - seems like only
         time-based invalidation is supported and it can't work across
         multiple PMFs
         (therefore, multiple thrift metastore servers)

         Solution Outline

         - Every table / partition will have an additional property called
         - Any call that modifies table or partition will bump up version
         of the
         table / partition
         - Guava based cache of thrift objects that come from metastore
         - We fire a single SQL matching versions before returning from
         - It is conceivable to have a mode wherein invalidation based on
         happens in a background thread (for higher performance, lower
         - Not proposing any locking (not shooting for world peace
         here :) )
         - We could extend HiveMetaStore class or create a new server
I think you want to do this at the ObjectStore level, not the HiveMetaStore
level.  Since the guava caching includes knowledge of how to fetch the item
into the cache the details of how to actually the fetch the item will bleed
into your caching layer.  You don't want to put SQL directly into the
HiveMetaStore layer since there are alternative, non-SQL implementations of
that layer (see below).

         Is this something that would be interesting to the community? Is
         problem already solved and should I spend my time watching GoT
There is work going on to enable storing metadata in HBase instead of an
RDBMS.  One of the explicit goals of this work is to radically reduce the
number of round trips between Hive and the metadata store.  So instead of
one thrift call resulting in many SQL calls it will result in a one or at
most a few HBase calls.  This may or may not solve your problem, and it may
be much more radical a solution than you desire.  Also, it isn't stable and
tested yet so you would have to wait a while for it.  But if you are
interested its happening in the hbase-metastore branch of Hive.



  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message