impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Armstrong <tarmstr...@cloudera.com>
Subject Re: Does Impala supports or plan to support Late Materialization
Date Tue, 20 Mar 2018 22:30:05 GMT
The page indices should solve a large part of this problem, but I can
definitely come up with examples where the page indices aren't sufficient
to avoid most materialisation if we have a predicate on an unsorted column.

E.g. if you have a predicate on a state column with 50 distinct values (I'm
being US-centric).

  select * from sales where state = 'MI'

Suppose there is some amount of locality to the data and on average you get
2 states per data page. You're probably only going to be able to filter out
~50% of pages using min-max filters since 'MI' will lie in-between many
pairs of states. Whereas if you scanned the 'state' column and materialized
the other columns lazily, you could filter out a large majority of the data
before materialising the other columns.

On Tue, Mar 20, 2018 at 9:20 AM, Alexander Behm <alex.behm@cloudera.com>
wrote:

> I think we do eventually want to support it. For highly selective queries
> the existing dictionary and min/max filtering can already be very
> effective. In addition, we plan to add indexes for finer-grained page
> pruning. See https://issues.apache.org/jira/browse/IMPALA-5842
>
> After all those improvements, it's not clear what the additional benefit
> of later materialization is going to be in practice.
>
> Do you have a case in mind that specifically requires late materialization
> to work well?
>
> On Tue, Mar 20, 2018 at 12:47 AM, Antoni Ivanov <aivanov@vmware.com>
> wrote:
>
>> Hi,
>>
>>
>>
>> You can ignore my question, Found the relevant JIRA -
>> https://issues.apache.org/jira/browse/IMPALA-2017 So I guess the answer
>> is not yet.
>>
>>
>>
>> Regards,
>>
>> Antoni
>>
>>
>>
>> *From:* Antoni Ivanov
>> *Sent:* Tuesday, March 20, 2018 9:45 AM
>> *To:* 'user@impala.apache.org' <user@impala.apache.org>
>> *Subject:* Does Impala supports or plan to support Late Materialization
>>
>>
>>
>> I don’t mean partition pruning but as described in
>>
>> https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-re
>> dshift-introduces-late-materialization-for-faster-query-processing/
>>
>>
>>
>> It basically pre-fetches first the filter columns and then after applying
>> the filter it fetches only the data from the rest of columns only if filter
>> applies.
>>
>>
>>
>> Thanks
>>
>
>

Mime
View raw message