impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Vissapragada <>
Subject Re: Questions about Statestore and Catalogservice
Date Fri, 08 Dec 2017 18:00:14 GMT
Looks like a topic for dev@.

On Fri, Dec 8, 2017 at 2:48 AM, Lars Francke <> wrote:

> Hi,
> I'm trying to understand how the communication between the components
> works.
> I understand that an impala daemon subscribes to the statestore. The
> statestore seems to have the concept of heartbeats and topics. But I'm not
> sure what topics are all about.

Statestore follows the standard pub-sub pattern
<> where a
publisher publishes messages and subscribers subscribe to the
messages/categories they are interested in.  Like you mentioned, statestore
is like a mediator between the publishers and the subscribers.

"Topic" is an abstraction that makes the content of these messages opaque
to the statestore. The publishers (like Catalog server for example)
serialize the messages (metadata for example) into a "Topic" to ship them
to the statestore which then broadcasts that to the interested subscribers
(coordinators). The coordinators then unpack/deserialize the topic into the
corresponding object classes (like Tables/Functions etc.) and apply those
updates locally.

In Impala, currently we have the following topics:

catalog-update - For Catalog metadata
impala-membership - For tracking liveness of the coordinators/executors
impala-request-queue -  For admission control

You can see these in the statestore web UI (/topics page)

> The docs also say that only the statestore communicates with the catalog
> service. How does that happen?

Can you point us to which doc you are referring to here?

Techincally speaking, the coordinators also connect to the Catalog service
for executing DDLs, but I'm assuming you are speaking here in terms of the
broadcast of the table updates, in which case Catalog sends those tables to
the statestore (as a part of catalog-update topic) and those are broadcast
by the statestore to all the coordinators. (described above)

How is a INVALIDATE/REFRESH statement routed from a daemon to the catalog
service and back?

I'll take the example of REFRESH here.  The metadata flow looks something
like this

- coordinator 'coo' gets 'refresh foo'
- 'coo' makes an RPC to the catalog server 'cat' for executing 'refresh'
- 'cat' refreshes the table 'foo', which changes the version of 'foo' from
v1 to v2 (Internally Catalog versions all the objects to track which
objects changed over time)
- 'cat' returns 'foo' (v2) directly to the coordinator 'coo'  (as the
result of RPC) which then applies the update locally.
- Additionally 'cat' also has a thread running in the background that
figures out that the 'foo' has changed (v1 -> v2), which then repacks 'foo'
into a "Topic" update and sends it to the statestore.
- Statestore then broadcasts the new updates to all the coordinators.

INVALIDATE is slightly different in the sense that the coordinator doesn't
get foo(v2) back as the result of the rpc, instead it gets an
"IncompleteTable" (Impala terminology) which means that the table is either
missing the catalog metadata/it has been invalidated.

There are many minor details on how the entire system works but "most"
Catalog updates work as above (with some exceptions).

> I'm sure I'll have follow-up questions but this would already be very
> helpful. Thank you!

Sure, feel free to ask the list. Here are some code pointers incase you are
(Topic/TopicEntry and other SS  abstractions)
(thrift definitions for most Catalog operations)
(how coordinators apply the Catalog updates)
(An example of how "catalog-update" is created & subscribed)


> Cheers,
> Lars

View raw message