impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fawze Abujaber <>
Subject Re: Adding impala daemons on servers without local HDFS storage
Date Tue, 17 Apr 2018 10:27:04 GMT
Thanks Tim,

That's means that i cannot disable this cross the impala cluster and i need
to manage this at the query level, right?

Is it any configuration at the cluster level to disable this?

On Wed, Apr 4, 2018 at 3:44 AM, Tim Armstrong <>

> I agree with Jim's answers.
> You may run into challenges if you have some Impala daemons that have
> local DataNodes and some that do not have local DataNodes. By default
> Impala always chooses a daemon with a local copy of the data, which would
> mean that daemons without a co-located DataNode might never get fragments
> scheduled on them. We do have a knob that let's you disable locality-based
> scheduling
> replica_preference.html but that may be too blunt an instrument.
> On Tue, Apr 3, 2018 at 11:34 AM, Jim Apple <> wrote:
>> I think the answers are:
>> 1. It depends on your workload and your network. I know some users run
>> with ONLY remote reads and still get performance they are happy with. Your
>> existing nodes will continue to be able to short-circuit read.
>> 2. This is highly workload-dependent. You want to try and avoid spilling,
>> obviously, but if your spinning disk can write 200MB/s it would take 3000
>> seconds, which is 50 minutes, to fill up.
>> 3. I think the impalads are smart enough to not try and do a
>> short-circuit read on data that isn't local.
>> On Tue, Apr 3, 2018 at 10:22 AM, Fawze Abujaber <>
>> wrote:
>>> Hi All,
>>> I have reached a point in my cluster that i don't need more storage for
>>> the HDFS and i need to add processing power, i'm using Yarn,Spark and
>>> Impala on the normal nodes for processing.
>>> My questions:
>>> 1- How much the data locality will impact impala performance as i know
>>> impala rely on data locality on it's processing?
>>> 2- I have OS disk with 600GB, will this be enough to be used to spill to
>>> disk when needed? is it dependent on other factors, the impala daemon
>>> memory limit is 35GB.
>>> 3- Should i disable the  *HDFS Short Circuit Read*  on these nodes?
>>> Will happy to get more recommendation on this ....
>>> --
>>> Take Care
>>> Fawze Abujaber

Take Care
Fawze Abujaber

View raw message