impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fawze Abujaber <fawz...@gmail.com>
Subject Re: Adding impala daemons on servers without local HDFS storage
Date Thu, 19 Apr 2018 16:47:51 GMT
Thanks Tim for you quick response as usual,

Can you send me a documentation how to do that or send me detail example
how to do that globally and per pool ...

Again much appreciate your readiness to help

On Thu, 19 Apr 2018 at 19:43 Tim Armstrong <tarmstrong@cloudera.com> wrote:

> We have a way to set global and per-pool defaults for query options. You
> can set default query options via the --default_query_options startup flag
> or if you have resource pools set up, you can set default query option
> values for queries submitted to each resource pool (including the default
> pool)
>
> On Tue, Apr 17, 2018 at 3:27 AM, Fawze Abujaber <fawzeaj@gmail.com> wrote:
>
>> Thanks Tim,
>>
>> That's means that i cannot disable this cross the impala cluster and i
>> need to manage this at the query level, right?
>>
>> Is it any configuration at the cluster level to disable this?
>>
>> On Wed, Apr 4, 2018 at 3:44 AM, Tim Armstrong <tarmstrong@cloudera.com>
>> wrote:
>>
>>> I agree with Jim's answers.
>>>
>>> You may run into challenges if you have some Impala daemons that have
>>> local DataNodes and some that do not have local DataNodes. By default
>>> Impala always chooses a daemon with a local copy of the data, which would
>>> mean that daemons without a co-located DataNode might never get fragments
>>> scheduled on them. We do have a knob that let's you disable locality-based
>>> scheduling
>>> https://impala.apache.org/docs/build/html/topics/impala_replica_preference.html
>>> but that may be too blunt an instrument.
>>>
>>> On Tue, Apr 3, 2018 at 11:34 AM, Jim Apple <jbapple@cloudera.com> wrote:
>>>
>>>> I think the answers are:
>>>>
>>>> 1. It depends on your workload and your network. I know some users run
>>>> with ONLY remote reads and still get performance they are happy with. Your
>>>> existing nodes will continue to be able to short-circuit read.
>>>>
>>>> 2. This is highly workload-dependent. You want to try and avoid
>>>> spilling, obviously, but if your spinning disk can write 200MB/s it would
>>>> take 3000 seconds, which is 50 minutes, to fill up.
>>>>
>>>> 3. I think the impalads are smart enough to not try and do a
>>>> short-circuit read on data that isn't local.
>>>>
>>>> On Tue, Apr 3, 2018 at 10:22 AM, Fawze Abujaber <fawzeaj@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I have reached a point in my cluster that i don't need more storage
>>>>> for the HDFS and i need to add processing power, i'm using Yarn,Spark
and
>>>>> Impala on the normal nodes for processing.
>>>>>
>>>>> My questions:
>>>>>
>>>>> 1- How much the data locality will impact impala performance as i know
>>>>> impala rely on data locality on it's processing?
>>>>>
>>>>> 2- I have OS disk with 600GB, will this be enough to be used to spill
>>>>> to disk when needed? is it dependent on other factors, the impala daemon
>>>>> memory limit is 35GB.
>>>>>
>>>>> 3- Should i disable the  *HDFS Short Circuit Read*  on these nodes?
>>>>>
>>>>> Will happy to get more recommendation on this ....
>>>>>
>>>>> --
>>>>> Take Care
>>>>> Fawze Abujaber
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Take Care
>> Fawze Abujaber
>>
>
> --
Take Care
Fawze Abujaber

Mime
View raw message