impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mostafa Mokhtar <mmokh...@cloudera.com>
Subject Re: Estimate peak memory VS used peak memory
Date Wed, 28 Feb 2018 17:56:40 GMT
Can you please share the query profiles for the failures you got along with the admission control
setting? 

Thanks 
Mostafa

> On Feb 28, 2018, at 9:28 AM, Fawze Abujaber <fawzeaj@gmail.com> wrote:
> 
> Thanks you all for your help and advises.
> 
> Unfortunately i rolled back the upgrade till i understand how to control impala resources
and tackle all the failures that i start to see after the upgrade.
> 
> 
> 
>> On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber <fawzeaj@gmail.com> wrote:
>> Hi Tim,
>> 
>> My Goal is : queries that their actual memory per node exceeds more than what i setup
as a default max memory node to fail, despite i have a different queries in the pool, in the
same pool some business queries can be simple as select count(*) and some others can have
few joins.
>> 
>> And i think this is the right decision and such query should be optimized.
>> 
>> And also if i'm looking in my historical queries, i can know from the max used memory
per node which queries will fail, and i think this help me alot, but i need any other query
to queued if it asked actual memory lower than what i setup as default max memory per node
for a query.
>> 
>> Based on the above i'm looking for the parameters that i need to configure.
>> 
>> i don't mind how much time and how much queries will queued, in my case i don't have
any impala query that running beyond 4-5 minutes and 80% of queries below 1 minute.
>> 
>> So i don't mind to setup the queue timeout to 20 minutes and max queued to 20-30
queries per pool.
>> 
>> I want to make sure no query will fail if it not exceeding the default memory per
node that i setup.
>> 
>> should i used only the default max memory per node alone? should i combined it with
the max running queries or with the memory limit of the whole pool?
>> 
>> 
>>> On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <tarmstrong@cloudera.com>
wrote:
>>> I think the previous answers have been good. I wanted to add a couple of side
notes for context since I've been doing a lot of work in this area of Impala. I could talk
about this stuff for hours.
>>> 
>>> We do have mechanisms, like spilling data to disk or reducing # of threads, that
kick in to keep queries under the mem_limit. This has existed in some form since Impala 2.0,
but Impala 2.10 included some architectural changes to make this more robust, and we have
further improvements in the pipeline. The end goal, which we're getting much closer to, is
that queries should reliably run to completion instead of getting killed after they are admitted.
>>> 
>>> That support is going to enable future enhancements to memory-based admission
control to make it easier for cluster admins like yourself to configure admission control.
It is definitely tricky to pick a good value for mem_limit when pools can contain a mix of
queries and I think Impala can do better at making these decisions automatically.
>>> 
>>> - Tim
>>> 
>>>> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm <alex.behm@cloudera.com>
wrote:
>>>> For a given query the logic for determining the memory that will be required
from admission is:
>>>> - if the query has mem_limit use that
>>>> - otherwise, use memory estimates from the planner
>>>> 
>>>> A query may be assigned a mem_limit by:
>>>> - taking the default mem_limit from the pool it was submitted to (this is
the recommended practice)
>>>> - manually setting one for the query (in case you want to override the pool
default for a single query)
>>>> 
>>>> In that setup, the memory estimates from the planner are irrelevant for admission
decisions and only serve for informational purposes.
>>>> Please do not read too much into the memory estimates from the planner. They
can be totally wrong (like your 8TB example).
>>>> 
>>>> 
>>>>> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <jeszyb@gmail.com> wrote:
>>>>> Again, the 8TB estimate would not be relevant if the query had a mem_limit
set.
>>>>> I think all that we discussed is covered in the docs, but if you feel
>>>>> like specific parts need clarification, please file a jira.
>>>>> 
>>>>> On 23 February 2018 at 11:51, Fawze Abujaber <fawzeaj@gmail.com>
wrote:
>>>>> > Sorry for  asking many questions, but i see your answers are closing
the
>>>>> > gaps that i cannot find in the documentation.
>>>>> >
>>>>> > So how we can explain that there was an estimate for 8T per node
and impala
>>>>> > decided to submit this query?
>>>>> >
>>>>> > My goal that each query running beyond the actual limit per node
to fail (
>>>>> > and this is what i setup in the default memory per node per pool)
an want
>>>>> > all other queries to be queue and not killed, so what i understand
that i
>>>>> > need to setup the max queue query to unlimited and the queue timeout
to
>>>>> > hours.
>>>>> >
>>>>> > And in order to reach that i need to setup the default memory per
node for
>>>>> > each pool and setting either max concurrency or the max memory per
pool that
>>>>> > will help to measure the max concurrent queries that can run in
specific
>>>>> > pool.
>>>>> >
>>>>> > I think reaching this goal will close all my gaps.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <jeszyb@gmail.com>
wrote:
>>>>> >>
>>>>> >> > Do queuing query or not is based on the prediction which
based on the
>>>>> >> > estimate and of course the concurrency that can run in
a pool.
>>>>> >>
>>>>> >> Yes, it is.
>>>>> >>
>>>>> >> > If I have memory limit per pool and memory limit per node
for a pool, so
>>>>> >> > it
>>>>> >> > can be used to estimate number of queries that can run
concurrently, is
>>>>> >> > this
>>>>> >> > also based on the prediction and not the actual use.
>>>>> >>
>>>>> >> Also on prediction.
>>>>> >
>>>>> >
>>>> 
>>> 
>> 
> 

Mime
View raw message