impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Cox <william....@distilnetworks.com>
Subject Re: queries not being submitted in Impala cluster despite free resources
Date Wed, 01 Feb 2017 17:23:25 GMT
Tim,

I have a 7 node cluster with 159.71 GB available to each Impala node (1.1TB
available total) - the default resource allocation pool has 700GB allocated
- so 100GB per node.

We have a "default query memory limit" set to 25GB. From reading (
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_mem_limit.html
) it would see that this means each node can only run 4 queries at once,
since Impala is requesting 25Gb per query regardless of the estimate
(100/25 = 4).

What I *don't* understand is how this works with running more than 4
queries *total* at any time - wouldn't Impala be asking for 25Gb for each
query on each node?

It should also be noted that we set up HA proxy in front of Impala (
http://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_proxy.html)
because we have a lot of adhoc users. From reading the Admission Control
docs it seems that maybe that's part of the problem: "Note that admission
control currently offers only soft limits when multiple coordinators are
being used."

So while I can only seem to run 4 queries per node, I can run more than 4
total because of the multiple coordinators?

-William



On Tue, Jan 31, 2017 at 2:08 PM, Tim Armstrong <tarmstrong@cloudera.com>
wrote:

> Do you have a default query memory limit set? Admission control does not
> generally work well if it's relying on the estimated memory requirement -
> you really need to have query memory limits set. If you have the default
> query memory limit set to 25GB, then admission control assumes that the
> query will use that amount on each node. I assume you mean 700GB memory
> total across all nodes - how much memory do you have per node?
>
> On Tue, Jan 31, 2017 at 7:31 AM, Jeszy <jeszyb@gmail.com> wrote:
>
>> That would be good. If they eventually run successfully, a query profile
>> would also be welcome.
>>
>> Thanks
>>
>> On Tue, Jan 31, 2017 at 4:28 PM, William Cox <
>> william.cox@distilnetworks.com> wrote:
>>
>>> Jeszy,
>>>
>>> Thanks for the suggestion. We also have a 25GB per-query limit set up.
>>> Queries that estimate a large size are rejected with an error stating they
>>> exceeded the memory limit. The queries I'm having trouble with are ones
>>> that have no such error but simply wait in the CREATED state. Next time it
>>> happens I'll see if I can grab the memory estimates and check.
>>> Thanks.
>>> -William
>>>
>>>
>>> On Tue, Jan 31, 2017 at 7:08 AM, Jeszy <jeszyb@gmail.com> wrote:
>>>
>>>> Hey William,
>>>>
>>>> IIUC you have configured both a memory-based upper bound and a #
>>>> queries upper bound for the default pool. A query can get queued if it
>>>> would exceed either of these limits. If you're not hitting the number of
>>>> queries one, then it's probably memory, which can happen even if not fully
>>>> utilized - unless you specify a mem_limit for the query, the estimated
>>>> memory requirement will be used for deciding whether the query should be
>>>> admitted. This can get out of hand when the cardinality estimation is off,
>>>> either due to a very complex query or because of missing / old stats.
>>>>
>>>> This is about memory-based admission control exclusively, but I think
>>>> it will be helpful: http://www.cloudera.com/docume
>>>> ntation/enterprise/latest/topics/impala_admission.html#admission_memory
>>>>
>>>> HTH
>>>>
>>>> On Mon, Jan 30, 2017 at 8:31 PM, William Cox <
>>>> william.cox@distilnetworks.com> wrote:
>>>>
>>>>> I'm running CDH CDH-5.8.0-1 and Impala =version 2.6.0-cdh5.8.0
>>>>> RELEASE (build 8d8652f69461f0dd8d5f474573fb5de7ceb0ee6b). We have
>>>>> enabled resource management and allocated  ~700Gb of memory with 30 running
>>>>> queries for the default. Our background data jobs are Unlimited.
>>>>>
>>>>>
>>>>> In spite of this setup, we still encounter times where queries will be
>>>>> marked as CREATED and waiting for allocation when the number of running
>>>>> queries is well below 30 and the amount of used memory, as listed in
the
>>>>> CDH UI, is well below 700GB.
>>>>>
>>>>> This is seemingly unpredicable. We've created extensive monitors to
>>>>> track # of running queries and memory usage but there seems to be no
>>>>> pattern to why/when these queries won't be submitted to the cluster.
>>>>>
>>>>> Is there some key metric that I might be missing or is there any
>>>>> suggestions folks have for tracking down these queries that won't be
>>>>> submitted?
>>>>> Thanks.
>>>>> -William
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message