impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeszy <jes...@gmail.com>
Subject Re: invalidate metadata behaviour
Date Wed, 29 Nov 2017 07:56:27 GMT
Hey Antoni,

On 29 November 2017 at 07:42, Antoni Ivanov <aivanov@vmware.com> wrote:
> Hi,
>
>
>
> I am wondering if I run INVALIDATE METADATA for the whole database on node1
>
> Then I ran a query on node2 – would the query on node2 used the cached
> metadata for the tables or it would know it’s invalidated?

Node2 would also eventually consider these invalidated.

> And second how safe it is to run it for a database with many (say 30) tables
> over 10,000 partitions and 2000 more under 5000 partitions (most of the
> under 100)
>
> And each Impala Deamon node has a little (below Cloudera recommended) memory
> (32G)

These numbers influence the size of the catalog cache, which is stored
in the catalog daemon centrally, and then replicated on each impalad,
or on each coordinator in more recent versions. The metadata you
mention (2000 tables * 5000 partitions each, plus the big tables) is
in the 10 million partitions range. Each of those will have at least
one file with 3 blocks, probably more, so all this adds up to a
sizeable metadata. The cached version will require a large amount of
memory (on the catalog as well as the daemons/coordinators), which
could easily lead to even small queries running out of memory with
only 32gb.

> Thanks,
>
> Antoni

HTH

Mime
View raw message