impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sailesh Mukil <sail...@cloudera.com>
Subject Re: Bottleneck
Date Thu, 07 Sep 2017 20:59:06 GMT
Alexander,

We received a response regarding the issue you're facing from another
Impala contributor, which fell off list. It's addressed inline below:

"Based on what he is describing it seems like IMPALA-5302  and IMPALA-4923
> are in play.
> To verify will need a couple of query profiles, then a print+screen from
> "sudo perf top" from one of the machines after letting it run for a couple
> 10 seconds while the queries are running."


On Wed, Sep 6, 2017 at 3:12 AM, Alexander Shoshin <
Alexander_Shoshin@epam.com> wrote:

> Hi,
>
>
>
> I guess my previous letter might not been delivered.
>
>
>
> Could you suggest is it possible to verify if the issue
> https://issues.apache.org/jira/browse/IMPALA-4923 affects my queries or
> not?
>
>
>
> Thanks,
>
> Alexander
>
>
>
>
>
> *From:* Alexander Shoshin
> *Sent:* Monday, September 04, 2017 1:21 PM
> *To:* user@impala.incubator.apache.org
> *Cc:* Special SBER-BPOC Team <SpecialSBER-BPOCTeam@epam.com>
> *Subject:* RE: Bottleneck
>
>
>
> Hi guys,
>
> thanks for advices!
>
>
>
> Tim,
>
> here are some more details about my cluster: I use CDH 5.11.1, Impala
> 2.8.0, each machine in cluster has 80 logical CPU cores and 700 GB of RAM.
> It’s hard to provide all queries profiles here, there are 28 of them.
> Different queries use from 10 MB to 7000 MB of RAM on each cluster node.
> Moreover if I use only “heavy” queries which consume several GB of RAM
> Impala starts use all available memory and some queries fails with “out of
> memory”. But I can’t force Impala use all memory with “light” queries.
>
>
>
> It looks like https://issues.apache.org/jira/browse/IMPALA-5302 as a part
> of https://issues.apache.org/jira/browse/IMPALA-4923 might be a reason. I
> know that the most reliable way to check whether this issue affects me or
> not is to update Impala to 2.9.0. But it’s not so easy for me because I
> don’t have all necessary administrative privileges. Is there a way to
> verify if this issue affects me or not? Or maybe there is a way to make
> some patch for this issue so I don’t need to update an Impala?
>
>
>
> Alexander,
>
> I am using all Impala daemons as coordinators. Each new query goes to a
> next coordinator in a list. I have tried to increase fe_service_threads
> from 64 up to 120 but there is still the same behavior. I have also tried
> to change be_service_threads, num_threads_per_core, num_hdfs_worker_threads
> — no result.
>
>
>
> Silvius,
>
> I will try to use this command, thanks.
>
>
>
> Regards,
>
> Alexander
>
>
>
>
>
> *From:* Silvius Rus [mailto:srus@cloudera.com <srus@cloudera.com>]
> *Sent:* Friday, September 01, 2017 11:11 PM
> *To:* user@impala.incubator.apache.org
> *Cc:* Special SBER-BPOC Team <SpecialSBER-BPOCTeam@epam.com>
> *Subject:* Re: Bottleneck
>
>
>
> One piece of information that might help is to run "perf top" on the
> machine with the highest CPU usage.
>
>
>
> On Fri, Sep 1, 2017 at 9:57 AM, Alexander Behm <alex.behm@cloudera.com>
> wrote:
>
> Are you submitting all queries to the same coordinator? If so, you might
> have to increase the --fe_service_threads to allow more concurrent
> connections.
>
> That said the single coordinator will eventually become a bottleneck, so
> we recommend submitting queries to different impalads.
>
>
>
> On Fri, Sep 1, 2017 at 9:41 AM, Tim Armstrong <tarmstrong@cloudera.com>
> wrote:
>
> Hi Alexander,
>
>   It's hard to know based on the information available. Query profiles
> often provide some clues here. I agree Impala would be able to max out one
> of the resources in most circumstances.
>
> On Impala 2.8 and earlier we saw behaviour similar to what you described
> when running queries with selective scans on machines with many cores:
> https://issues.apache.org/jira/browse/IMPALA-4923 . The bottleneck there
> was lock contention during memory allocation - the threads spent a lot of
> time asleep waiting to get a shared lock.
>
>
>
> On Fri, Sep 1, 2017 at 8:36 AM, Alexander Shoshin <
> Alexander_Shoshin@epam.com> wrote:
>
> Hi,
>
>
>
> I am working with Impala trying to find its maximum throughput on my
> hardware. I have a cluster under Cloudera Manager which consists of 7
> machines (1 master node + 6 worker nodes).
>
>
>
> I am running queries on Impala using JDBC. I’ve reached maximum throughput
> equals 80 finished queries per minute. It doesn’t grow up no matter how
> many hundreds of concurrent queries I send. But the strange thing is that
> no one of resources (memory, CPU, disk read/write, net send/received)
> hasn’t reached its maximum. They are used less than on a half.
>
>
>
> Could you suppose what can be a bottleneck? May it be some Impala setting
> that limits performance or maximum concurrent threads? The mem_limit option
> for my Impala daemons is about 70% of available machine memory.
>
>
>
> Thanks,
>
> Alexander
>
>
>
>
>
>
>

Mime
View raw message