impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shoshin <Alexander_Shos...@epam.com>
Subject RE: Bottleneck
Date Fri, 08 Sep 2017 14:38:45 GMT
Thanks Sailesh!

I will try to collect query profiles and “perf top” output.

Regards,
Alexander


From: Sailesh Mukil [mailto:sailesh@cloudera.com]
Sent: Thursday, September 07, 2017 11:59 PM
To: user@impala.incubator.apache.org
Cc: Special SBER-BPOC Team <SpecialSBER-BPOCTeam@epam.com>
Subject: Re: Bottleneck

Alexander,

We received a response regarding the issue you're facing from another Impala contributor,
which fell off list. It's addressed inline below:

"Based on what he is describing it seems like IMPALA-5302  and IMPALA-4923 are in play.
To verify will need a couple of query profiles, then a print+screen from "sudo perf top" from
one of the machines after letting it run for a couple 10 seconds while the queries are running."

On Wed, Sep 6, 2017 at 3:12 AM, Alexander Shoshin <Alexander_Shoshin@epam.com<mailto:Alexander_Shoshin@epam.com>>
wrote:
Hi,

I guess my previous letter might not been delivered.

Could you suggest is it possible to verify if the issue https://issues.apache.org/jira/browse/IMPALA-4923
affects my queries or not?

Thanks,
Alexander


From: Alexander Shoshin
Sent: Monday, September 04, 2017 1:21 PM
To: user@impala.incubator.apache.org<mailto:user@impala.incubator.apache.org>
Cc: Special SBER-BPOC Team <SpecialSBER-BPOCTeam@epam.com<mailto:SpecialSBER-BPOCTeam@epam.com>>
Subject: RE: Bottleneck

Hi guys,
thanks for advices!

Tim,
here are some more details about my cluster: I use CDH 5.11.1, Impala 2.8.0, each machine
in cluster has 80 logical CPU cores and 700 GB of RAM. It’s hard to provide all queries
profiles here, there are 28 of them. Different queries use from 10 MB to 7000 MB of RAM on
each cluster node. Moreover if I use only “heavy” queries which consume several GB of
RAM Impala starts use all available memory and some queries fails with “out of memory”.
But I can’t force Impala use all memory with “light” queries.

It looks like https://issues.apache.org/jira/browse/IMPALA-5302 as a part of https://issues.apache.org/jira/browse/IMPALA-4923
might be a reason. I know that the most reliable way to check whether this issue affects me
or not is to update Impala to 2.9.0. But it’s not so easy for me because I don’t have
all necessary administrative privileges. Is there a way to verify if this issue affects me
or not? Or maybe there is a way to make some patch for this issue so I don’t need to update
an Impala?

Alexander,
I am using all Impala daemons as coordinators. Each new query goes to a next coordinator in
a list. I have tried to increase fe_service_threads from 64 up to 120 but there is still the
same behavior. I have also tried to change be_service_threads, num_threads_per_core, num_hdfs_worker_threads
— no result.

Silvius,
I will try to use this command, thanks.

Regards,
Alexander


From: Silvius Rus [mailto:srus@cloudera.com]
Sent: Friday, September 01, 2017 11:11 PM
To: user@impala.incubator.apache.org<mailto:user@impala.incubator.apache.org>
Cc: Special SBER-BPOC Team <SpecialSBER-BPOCTeam@epam.com<mailto:SpecialSBER-BPOCTeam@epam.com>>
Subject: Re: Bottleneck

One piece of information that might help is to run "perf top" on the machine with the highest
CPU usage.

On Fri, Sep 1, 2017 at 9:57 AM, Alexander Behm <alex.behm@cloudera.com<mailto:alex.behm@cloudera.com>>
wrote:
Are you submitting all queries to the same coordinator? If so, you might have to increase
the --fe_service_threads to allow more concurrent connections.
That said the single coordinator will eventually become a bottleneck, so we recommend submitting
queries to different impalads.

On Fri, Sep 1, 2017 at 9:41 AM, Tim Armstrong <tarmstrong@cloudera.com<mailto:tarmstrong@cloudera.com>>
wrote:
Hi Alexander,
  It's hard to know based on the information available. Query profiles often provide some
clues here. I agree Impala would be able to max out one of the resources in most circumstances.
On Impala 2.8 and earlier we saw behaviour similar to what you described when running queries
with selective scans on machines with many cores: https://issues.apache.org/jira/browse/IMPALA-4923
. The bottleneck there was lock contention during memory allocation - the threads spent a
lot of time asleep waiting to get a shared lock.

On Fri, Sep 1, 2017 at 8:36 AM, Alexander Shoshin <Alexander_Shoshin@epam.com<mailto:Alexander_Shoshin@epam.com>>
wrote:
Hi,

I am working with Impala trying to find its maximum throughput on my hardware. I have a cluster
under Cloudera Manager which consists of 7 machines (1 master node + 6 worker nodes).

I am running queries on Impala using JDBC. I’ve reached maximum throughput equals 80 finished
queries per minute. It doesn’t grow up no matter how many hundreds of concurrent queries
I send. But the strange thing is that no one of resources (memory, CPU, disk read/write, net
send/received) hasn’t reached its maximum. They are used less than on a half.

Could you suppose what can be a bottleneck? May it be some Impala setting that limits performance
or maximum concurrent threads? The mem_limit option for my Impala daemons is about 70% of
available machine memory.

Thanks,
Alexander




Mime
View raw message