impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Horvath <>
Subject Re: Impala State Store and Catalog Service resource requirements
Date Thu, 01 Mar 2018 08:48:43 GMT
Hi Laszlo,

Thank you for you inputs, this was indeed absolutely helpful.

The formula for the Catalog Service is precisely what I have been looking

Catalog memory usage
• Metadata cache heap memory usage can be calculated by
• num of tables * 5KB + num of partitions * 2KB + num of files * 750B + num
of file blocks * 300B + sum(incremental col stats per table)
• Incremental stats
•For each table, num columns * num partitions * 400B

Have you by any chance seen recommendations regarding the hardware
requirements of the statestore?
Is my understanding correct that catalog service and statestore are almost
like shared in-memory caches, where the focus is on memory (and not CPU)?

I would like to get a good understanding of the relative weight of these
components compared to the core impalad daemon.

Once again, thank you very much for you help.


On Tue, Feb 27, 2018 at 5:44 PM, Laszlo Gaal <>

> Hi Peter,
> For starters I would recommend the following overviews:
> 1. The Apache Impala website has a pretty comprehensive Impala guide, the
> 2.10.0 version can be found at
> docs/build/impala-2.10.pdf. Sizing considerations start on page 20.
> 2. Putting on my Cloudera hat for a moment: A good summary slide deck is
> the Impala Cookbook, created by Cloudera's Impala developers and field
> engineers, available on SlideShare: https://www.
> To answer your specific question: The statestore and the catalog are
> usually recommended to run on their own dedicated hosts, separate from the
> worker nodes. The catalog has significant memory requirements, as it has to
> keep the complete metadata in memory (databases/tables/fields, the file
> layout for the tables and the HDFS block layout of the files, and
> optionally all security permissions from Sentry). You can find sizing
> formulas both for the memory requirements and for storage sizing in the
> above documents.
> I'm sure the community would be able to offer more specific help given
> more details about your setup and workload.
> Hope this helps,
>   - LaszloG
> On Tue, Feb 27, 2018 at 1:58 AM, Peter Horvath <> wrote:
>> Dear All,
>> I am in the process of setting up a Hadoop cluster including Impala
>> v2.10.0.
>> I would like to configure Impala State Store and Catalog Service
>> appropriately (maybe even on a dedicated host), however I cannot really
>> find any documentation on the resource needs of these services or any
>> other best practices regarding the sizing of the host machine.
>> For example I do not know how much memory or disk space should I reserve
>> for these services: based on my understanding Impala State Store and
>> Catalog Service should be of relatively small footprint compared to
>> other big data components, but I am not sure I would be able make a right
>> estimation on my own.
>> Could someone please point me into the right direction?
>> Thank you,
>> Peter

View raw message