impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Horvath <petho...@gmail.com>
Subject Re: Impala State Store and Catalog Service resource requirements
Date Thu, 01 Mar 2018 08:48:43 GMT
Hi Laszlo,

Thank you for you inputs, this was indeed absolutely helpful.

The formula for the Catalog Service is precisely what I have been looking
for:

Catalog memory usage
• Metadata cache heap memory usage can be calculated by
• num of tables * 5KB + num of partitions * 2KB + num of files * 750B + num
of file blocks * 300B + sum(incremental col stats per table)
• Incremental stats
•For each table, num columns * num partitions * 400B

Have you by any chance seen recommendations regarding the hardware
requirements of the statestore?
Is my understanding correct that catalog service and statestore are almost
like shared in-memory caches, where the focus is on memory (and not CPU)?

I would like to get a good understanding of the relative weight of these
components compared to the core impalad daemon.

Once again, thank you very much for you help.

Cheers,
Peter












On Tue, Feb 27, 2018 at 5:44 PM, Laszlo Gaal <laszlo.gaal@cloudera.com>
wrote:

> Hi Peter,
>
> For starters I would recommend the following overviews:
> 1. The Apache Impala website has a pretty comprehensive Impala guide, the
> 2.10.0 version can be found at http://impala.apache.org/
> docs/build/impala-2.10.pdf. Sizing considerations start on page 20.
> 2. Putting on my Cloudera hat for a moment: A good summary slide deck is
> the Impala Cookbook, created by Cloudera's Impala developers and field
> engineers, available on SlideShare: https://www.
> slideshare.net/cloudera/the-impala-cookbook-42530186
>
> To answer your specific question: The statestore and the catalog are
> usually recommended to run on their own dedicated hosts, separate from the
> worker nodes. The catalog has significant memory requirements, as it has to
> keep the complete metadata in memory (databases/tables/fields, the file
> layout for the tables and the HDFS block layout of the files, and
> optionally all security permissions from Sentry). You can find sizing
> formulas both for the memory requirements and for storage sizing in the
> above documents.
>
> I'm sure the community would be able to offer more specific help given
> more details about your setup and workload.
>
> Hope this helps,
>
>   - LaszloG
>
> On Tue, Feb 27, 2018 at 1:58 AM, Peter Horvath <pethor84@gmail.com> wrote:
>
>> Dear All,
>>
>> I am in the process of setting up a Hadoop cluster including Impala
>> v2.10.0.
>>
>> I would like to configure Impala State Store and Catalog Service
>> appropriately (maybe even on a dedicated host), however I cannot really
>> find any documentation on the resource needs of these services or any
>> other best practices regarding the sizing of the host machine.
>>
>> For example I do not know how much memory or disk space should I reserve
>> for these services: based on my understanding Impala State Store and
>> Catalog Service should be of relatively small footprint compared to
>> other big data components, but I am not sure I would be able make a right
>> estimation on my own.
>>
>> Could someone please point me into the right direction?
>>
>> Thank you,
>> Peter
>>
>>
>

Mime
View raw message