impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Laszlo Gaal <laszlo.g...@cloudera.com>
Subject Re: Impala State Store and Catalog Service resource requirements
Date Tue, 27 Feb 2018 16:44:49 GMT
Hi Peter,

For starters I would recommend the following overviews:
1. The Apache Impala website has a pretty comprehensive Impala guide, the
2.10.0 version can be found at
http://impala.apache.org/docs/build/impala-2.10.pdf. Sizing considerations
start on page 20.
2. Putting on my Cloudera hat for a moment: A good summary slide deck is
the Impala Cookbook, created by Cloudera's Impala developers and field
engineers, available on SlideShare:
https://www.slideshare.net/cloudera/the-impala-cookbook-42530186

To answer your specific question: The statestore and the catalog are
usually recommended to run on their own dedicated hosts, separate from the
worker nodes. The catalog has significant memory requirements, as it has to
keep the complete metadata in memory (databases/tables/fields, the file
layout for the tables and the HDFS block layout of the files, and
optionally all security permissions from Sentry). You can find sizing
formulas both for the memory requirements and for storage sizing in the
above documents.

I'm sure the community would be able to offer more specific help given more
details about your setup and workload.

Hope this helps,

  - LaszloG

On Tue, Feb 27, 2018 at 1:58 AM, Peter Horvath <pethor84@gmail.com> wrote:

> Dear All,
>
> I am in the process of setting up a Hadoop cluster including Impala
> v2.10.0.
>
> I would like to configure Impala State Store and Catalog Service
> appropriately (maybe even on a dedicated host), however I cannot really
> find any documentation on the resource needs of these services or any
> other best practices regarding the sizing of the host machine.
>
> For example I do not know how much memory or disk space should I reserve
> for these services: based on my understanding Impala State Store and
> Catalog Service should be of relatively small footprint compared to other
> big data components, but I am not sure I would be able make a right
> estimation on my own.
>
> Could someone please point me into the right direction?
>
> Thank you,
> Peter
>
>

Mime
View raw message