Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 05891200D64 for ; Mon, 11 Dec 2017 10:48:27 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 04146160C10; Mon, 11 Dec 2017 09:48:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2388C160C13 for ; Mon, 11 Dec 2017 10:48:25 +0100 (CET) Received: (qmail 47905 invoked by uid 500); 11 Dec 2017 09:48:25 -0000 Mailing-List: contact dev-help@impala.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.apache.org Delivered-To: mailing list dev@impala.apache.org Received: (qmail 47703 invoked by uid 99); 11 Dec 2017 09:48:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Dec 2017 09:48:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 8A99DC4431; Mon, 11 Dec 2017 09:48:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.121 X-Spam-Level: X-Spam-Status: No, score=-0.121 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 2GnErq19Cp5U; Mon, 11 Dec 2017 09:48:23 +0000 (UTC) Received: from mail-yb0-f174.google.com (mail-yb0-f174.google.com [209.85.213.174]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 86D3A5F1EE; Mon, 11 Dec 2017 09:48:22 +0000 (UTC) Received: by mail-yb0-f174.google.com with SMTP id 69so3532394ybc.6; Mon, 11 Dec 2017 01:48:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=67HlMih1P5osiVuu2hzvY4OMWNDSjsZzT+L5sbjaWfo=; b=avVMAqvrxB1Cr583AUqKyFJroDkQzezogktWqFRZDsZm9/ecdoj5Tsz/OeYqnbmp1P ROi4cAnacsEfI9jsfW6bdff5hNx/gdaZ7myZfasy7/bVMeWTmHP3RWZbwXlmQT8LlSOn AWAWwL8e021b6Qq4vn31sgGgovBIWdNi479PxOo1x16bCBb1L6/hGaagljfgkRBDlT1n d1U7hMsMhcfa+/D43lpzvrHpw35b8JvgM1pZTVUfWFhxNaVWvbBvqWH9lyUoVXTIrgk2 sdW4HGY47DjGBWYroh0GbILPqZuVfWY3sW4FECpxjMCZHnPevYldJPLTEBKmklE1ywpZ usog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=67HlMih1P5osiVuu2hzvY4OMWNDSjsZzT+L5sbjaWfo=; b=jUSuYoFaqoDzuYnRVkHvlQb6qdbFBxCQNBsioyVSinD+0c3slQSk5g23jjFgxQv582 OLzzDoAuYIj+/Ms7YF0BUYw0IEqdbTQNoSCqOq+Aws9ZR2F1rVd05WDuC0s6l03UpFHX d1hW2vJQADxFN+JRG906axOhEumHQVeMHFw+EVOtXmDTkURYQFqUXK2qKCliQaBBddAF w50dbgFA3LShDGK61CWi4kt9/DfZgT/y1w8ydkyX6w8yPpOmvoOhS9DJqioVLhdhAvS0 V6tYdtgc6MxFhJtTkCuLR4sgt/q0TvSTuLVlCvvg6l6FjnVYX4UrdzSQq71nz+f7YOgk pFLg== X-Gm-Message-State: AKGB3mIKFVIScVwYEbn6ihfYN1/vAIkqBHHBGmCvOF7wXhgOue6/dnDQ XNxosjYD8afu0WLqPoQ4gqHqPoEKoFzyBX/FiXtjPQ== X-Google-Smtp-Source: AGs4zMaB9q9T6ynh3okCW9Cac+PfB2xCmFsm0bcJqCWmMhf4s3KaryU49V3umtsQhepCcdL2zw9qkOVOzHxbnBWnn9I= X-Received: by 10.129.75.5 with SMTP id y5mr15679431ywa.265.1512985694799; Mon, 11 Dec 2017 01:48:14 -0800 (PST) MIME-Version: 1.0 Received: by 10.129.162.207 with HTTP; Mon, 11 Dec 2017 01:48:14 -0800 (PST) In-Reply-To: References: From: Jeszy Date: Mon, 11 Dec 2017 10:48:14 +0100 Message-ID: Subject: Re: Questions about Statestore and Catalogservice To: user@impala.apache.org Cc: dev@impala.apache.org Content-Type: text/plain; charset="UTF-8" archived-at: Mon, 11 Dec 2017 09:48:27 -0000 Thanks for pointing out the docs issue! I opened IMPALA-6303 to track it. On 10 December 2017 at 15:47, Lars Francke wrote: > Thank you Bharath & Dimitris! > > That answers all the questions I have right now, thank you so much for > taking the time to write it up. > > Regarding the docs: > > >> The Impala component known as the catalog service relays the metadata >> changes from Impala SQL statements to all the DataNodes in a cluster. It is >> physically represented by a daemon process named catalogd; you only need >> such a process on one host in the cluster. Because the requests are passed >> through the statestore daemon, it makes sense to run the statestored and >> catalogd services on the same host. > > Reading it again now it also says "DataNodes" which is not correct. > > Cheers, > Lars > > > On Fri, Dec 8, 2017 at 7:00 PM, Bharath Vissapragada > wrote: >> >> Looks like a topic for dev@. >> >> On Fri, Dec 8, 2017 at 2:48 AM, Lars Francke >> wrote: >>> >>> Hi, >>> >>> I'm trying to understand how the communication between the components >>> works. >>> >>> I understand that an impala daemon subscribes to the statestore. The >>> statestore seems to have the concept of heartbeats and topics. But I'm not >>> sure what topics are all about. >> >> >> Statestore follows the standard pub-sub pattern where a publisher >> publishes messages and subscribers subscribe to the messages/categories they >> are interested in. Like you mentioned, statestore is like a mediator >> between the publishers and the subscribers. >> >> "Topic" is an abstraction that makes the content of these messages opaque >> to the statestore. The publishers (like Catalog server for example) >> serialize the messages (metadata for example) into a "Topic" to ship them to >> the statestore which then broadcasts that to the interested subscribers >> (coordinators). The coordinators then unpack/deserialize the topic into the >> corresponding object classes (like Tables/Functions etc.) and apply those >> updates locally. >> >> In Impala, currently we have the following topics: >> >> catalog-update - For Catalog metadata >> impala-membership - For tracking liveness of the coordinators/executors >> impala-request-queue - For admission control >> >> You can see these in the statestore web UI (/topics page) >> >>> >>> >>> The docs also say that only the statestore communicates with the catalog >>> service. How does that happen? >> >> >> Can you point us to which doc you are referring to here? >> >> Techincally speaking, the coordinators also connect to the Catalog service >> for executing DDLs, but I'm assuming you are speaking here in terms of the >> broadcast of the table updates, in which case Catalog sends those tables to >> the statestore (as a part of catalog-update topic) and those are broadcast >> by the statestore to all the coordinators. (described above) >> >> How is a INVALIDATE/REFRESH statement routed from a daemon to the catalog >> service and back? >> >> I'll take the example of REFRESH here. The metadata flow looks something >> like this >> >> - coordinator 'coo' gets 'refresh foo' >> - 'coo' makes an RPC to the catalog server 'cat' for executing 'refresh' >> - 'cat' refreshes the table 'foo', which changes the version of 'foo' from >> v1 to v2 (Internally Catalog versions all the objects to track which objects >> changed over time) >> - 'cat' returns 'foo' (v2) directly to the coordinator 'coo' (as the >> result of RPC) which then applies the update locally. >> - Additionally 'cat' also has a thread running in the background that >> figures out that the 'foo' has changed (v1 -> v2), which then repacks 'foo' >> into a "Topic" update and sends it to the statestore. >> - Statestore then broadcasts the new updates to all the coordinators. >> >> INVALIDATE is slightly different in the sense that the coordinator doesn't >> get foo(v2) back as the result of the rpc, instead it gets an >> "IncompleteTable" (Impala terminology) which means that the table is either >> missing the catalog metadata/it has been invalidated. >> >> There are many minor details on how the entire system works but "most" >> Catalog updates work as above (with some exceptions). >> >>> >>> I'm sure I'll have follow-up questions but this would already be very >>> helpful. Thank you! >> >> >> Sure, feel free to ask the list. Here are some code pointers incase you >> are interested. >> >> >> https://github.com/apache/impala/blob/master/be/src/statestore/statestore.h >> (Topic/TopicEntry and other SS abstractions) >> >> >> https://github.com/apache/impala/blob/master/common/thrift/CatalogService.thrift#L45 >> (thrift definitions for most Catalog operations) >> >> >> https://github.com/apache/impala/blob/master/be/src/service/impala-server.h#L324 >> (how coordinators apply the Catalog updates) >> >> >> https://github.com/apache/impala/blob/master/be/src/catalog/catalog-server.cc#L187 >> (An example of how "catalog-update" is created & subscribed) >> >> HTH. >> >>> >>> Cheers, >>> Lars >> >> >