impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shoshin <>
Subject RE: Bottleneck
Date Mon, 04 Sep 2017 10:20:37 GMT
Hi guys,
thanks for advices!

here are some more details about my cluster: I use CDH 5.11.1, Impala 2.8.0, each machine
in cluster has 80 logical CPU cores and 700 GB of RAM. It’s hard to provide all queries
profiles here, there are 28 of them. Different queries use from 10 MB to 7000 MB of RAM on
each cluster node. Moreover if I use only “heavy” queries which consume several GB of
RAM Impala starts use all available memory and some queries fails with “out of memory”.
But I can’t force Impala use all memory with “light” queries.

It looks like as a part of
might be a reason. I know that the most reliable way to check whether this issue affects me
or not is to update Impala to 2.9.0. But it’s not so easy for me because I don’t have
all necessary administrative privileges. Is there a way to verify if this issue affects me
or not? Or maybe there is a way to make some patch for this issue so I don’t need to update
an Impala?

I am using all Impala daemons as coordinators. Each new query goes to a next coordinator in
a list. I have tried to increase fe_service_threads from 64 up to 120 but there is still the
same behavior. I have also tried to change be_service_threads, num_threads_per_core, num_hdfs_worker_threads
— no result.

I will try to use this command, thanks.


From: Silvius Rus []
Sent: Friday, September 01, 2017 11:11 PM
Cc: Special SBER-BPOC Team <>
Subject: Re: Bottleneck

One piece of information that might help is to run "perf top" on the machine with the highest
CPU usage.

On Fri, Sep 1, 2017 at 9:57 AM, Alexander Behm <<>>
Are you submitting all queries to the same coordinator? If so, you might have to increase
the --fe_service_threads to allow more concurrent connections.
That said the single coordinator will eventually become a bottleneck, so we recommend submitting
queries to different impalads.

On Fri, Sep 1, 2017 at 9:41 AM, Tim Armstrong <<>>
Hi Alexander,
  It's hard to know based on the information available. Query profiles often provide some
clues here. I agree Impala would be able to max out one of the resources in most circumstances.
On Impala 2.8 and earlier we saw behaviour similar to what you described when running queries
with selective scans on machines with many cores:
. The bottleneck there was lock contention during memory allocation - the threads spent a
lot of time asleep waiting to get a shared lock.

On Fri, Sep 1, 2017 at 8:36 AM, Alexander Shoshin <<>>

I am working with Impala trying to find its maximum throughput on my hardware. I have a cluster
under Cloudera Manager which consists of 7 machines (1 master node + 6 worker nodes).

I am running queries on Impala using JDBC. I’ve reached maximum throughput equals 80 finished
queries per minute. It doesn’t grow up no matter how many hundreds of concurrent queries
I send. But the strange thing is that no one of resources (memory, CPU, disk read/write, net
send/received) hasn’t reached its maximum. They are used less than on a half.

Could you suppose what can be a bottleneck? May it be some Impala setting that limits performance
or maximum concurrent threads? The mem_limit option for my Impala daemons is about 70% of
available machine memory.


View raw message