impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fawze Abujaber <fawz...@gmail.com>
Subject Re: Adding impala daemons on servers without local HDFS storage
Date Tue, 17 Apr 2018 10:27:04 GMT
Thanks Tim,

That's means that i cannot disable this cross the impala cluster and i need
to manage this at the query level, right?

Is it any configuration at the cluster level to disable this?

On Wed, Apr 4, 2018 at 3:44 AM, Tim Armstrong <tarmstrong@cloudera.com>
wrote:

> I agree with Jim's answers.
>
> You may run into challenges if you have some Impala daemons that have
> local DataNodes and some that do not have local DataNodes. By default
> Impala always chooses a daemon with a local copy of the data, which would
> mean that daemons without a co-located DataNode might never get fragments
> scheduled on them. We do have a knob that let's you disable locality-based
> scheduling https://impala.apache.org/docs/build/html/topics/impala_
> replica_preference.html but that may be too blunt an instrument.
>
> On Tue, Apr 3, 2018 at 11:34 AM, Jim Apple <jbapple@cloudera.com> wrote:
>
>> I think the answers are:
>>
>> 1. It depends on your workload and your network. I know some users run
>> with ONLY remote reads and still get performance they are happy with. Your
>> existing nodes will continue to be able to short-circuit read.
>>
>> 2. This is highly workload-dependent. You want to try and avoid spilling,
>> obviously, but if your spinning disk can write 200MB/s it would take 3000
>> seconds, which is 50 minutes, to fill up.
>>
>> 3. I think the impalads are smart enough to not try and do a
>> short-circuit read on data that isn't local.
>>
>> On Tue, Apr 3, 2018 at 10:22 AM, Fawze Abujaber <fawzeaj@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I have reached a point in my cluster that i don't need more storage for
>>> the HDFS and i need to add processing power, i'm using Yarn,Spark and
>>> Impala on the normal nodes for processing.
>>>
>>> My questions:
>>>
>>> 1- How much the data locality will impact impala performance as i know
>>> impala rely on data locality on it's processing?
>>>
>>> 2- I have OS disk with 600GB, will this be enough to be used to spill to
>>> disk when needed? is it dependent on other factors, the impala daemon
>>> memory limit is 35GB.
>>>
>>> 3- Should i disable the  *HDFS Short Circuit Read*  on these nodes?
>>>
>>> Will happy to get more recommendation on this ....
>>>
>>> --
>>> Take Care
>>> Fawze Abujaber
>>>
>>
>>
>


-- 
Take Care
Fawze Abujaber

Mime
View raw message