impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fawze Abujaber <fawz...@gmail.com>
Subject Re: Estimate peak memory VS used peak memory
Date Wed, 28 Feb 2018 17:28:02 GMT
Thanks you all for your help and advises.

Unfortunately i rolled back the upgrade till i understand how to control
impala resources and tackle all the failures that i start to see after the
upgrade.



On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber <fawzeaj@gmail.com> wrote:

> Hi Tim,
>
> My Goal is : queries that their actual memory per node exceeds more than
> what i setup as a default max memory node to fail, despite i have a
> different queries in the pool, in the same pool some business queries can
> be simple as select count(*) and some others can have few joins.
>
> And i think this is the right decision and such query should be optimized.
>
> And also if i'm looking in my historical queries, i can know from the max
> used memory per node which queries will fail, and i think this help me
> alot, but i need any other query to queued if it asked actual memory lower
> than what i setup as default max memory per node for a query.
>
> Based on the above i'm looking for the parameters that i need to configure.
>
> i don't mind how much time and how much queries will queued, in my case i
> don't have any impala query that running beyond 4-5 minutes and 80% of
> queries below 1 minute.
>
> So i don't mind to setup the queue timeout to 20 minutes and max queued to
> 20-30 queries per pool.
>
> I want to make sure no query will fail if it not exceeding the default
> memory per node that i setup.
>
> should i used only the default max memory per node alone? should i
> combined it with the max running queries or with the memory limit of the
> whole pool?
>
>
> On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <tarmstrong@cloudera.com>
> wrote:
>
>> I think the previous answers have been good. I wanted to add a couple of
>> side notes for context since I've been doing a lot of work in this area of
>> Impala. I could talk about this stuff for hours.
>>
>> We do have mechanisms, like spilling data to disk or reducing # of
>> threads, that kick in to keep queries under the mem_limit. This has existed
>> in some form since Impala 2.0, but Impala 2.10 included some architectural
>> changes to make this more robust, and we have further improvements in the
>> pipeline. The end goal, which we're getting much closer to, is that queries
>> should reliably run to completion instead of getting killed after they are
>> admitted.
>>
>> That support is going to enable future enhancements to memory-based
>> admission control to make it easier for cluster admins like yourself to
>> configure admission control. It is definitely tricky to pick a good value
>> for mem_limit when pools can contain a mix of queries and I think Impala
>> can do better at making these decisions automatically.
>>
>> - Tim
>>
>> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm <alex.behm@cloudera.com>
>> wrote:
>>
>>> For a given query the logic for determining the memory that will be
>>> required from admission is:
>>> - if the query has mem_limit use that
>>> - otherwise, use memory estimates from the planner
>>>
>>> A query may be assigned a mem_limit by:
>>> - taking the default mem_limit from the pool it was submitted to (this
>>> is the recommended practice)
>>> - manually setting one for the query (in case you want to override the
>>> pool default for a single query)
>>>
>>> In that setup, the memory estimates from the planner are irrelevant for
>>> admission decisions and only serve for informational purposes.
>>> Please do not read too much into the memory estimates from the planner.
>>> They can be totally wrong (like your 8TB example).
>>>
>>>
>>> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <jeszyb@gmail.com> wrote:
>>>
>>>> Again, the 8TB estimate would not be relevant if the query had a
>>>> mem_limit set.
>>>> I think all that we discussed is covered in the docs, but if you feel
>>>> like specific parts need clarification, please file a jira.
>>>>
>>>> On 23 February 2018 at 11:51, Fawze Abujaber <fawzeaj@gmail.com> wrote:
>>>> > Sorry for  asking many questions, but i see your answers are closing
>>>> the
>>>> > gaps that i cannot find in the documentation.
>>>> >
>>>> > So how we can explain that there was an estimate for 8T per node and
>>>> impala
>>>> > decided to submit this query?
>>>> >
>>>> > My goal that each query running beyond the actual limit per node to
>>>> fail (
>>>> > and this is what i setup in the default memory per node per pool) an
>>>> want
>>>> > all other queries to be queue and not killed, so what i understand
>>>> that i
>>>> > need to setup the max queue query to unlimited and the queue timeout
>>>> to
>>>> > hours.
>>>> >
>>>> > And in order to reach that i need to setup the default memory per
>>>> node for
>>>> > each pool and setting either max concurrency or the max memory per
>>>> pool that
>>>> > will help to measure the max concurrent queries that can run in
>>>> specific
>>>> > pool.
>>>> >
>>>> > I think reaching this goal will close all my gaps.
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <jeszyb@gmail.com> wrote:
>>>> >>
>>>> >> > Do queuing query or not is based on the prediction which based
on
>>>> the
>>>> >> > estimate and of course the concurrency that can run in a pool.
>>>> >>
>>>> >> Yes, it is.
>>>> >>
>>>> >> > If I have memory limit per pool and memory limit per node for
a
>>>> pool, so
>>>> >> > it
>>>> >> > can be used to estimate number of queries that can run
>>>> concurrently, is
>>>> >> > this
>>>> >> > also based on the prediction and not the actual use.
>>>> >>
>>>> >> Also on prediction.
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Mime
View raw message