impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mostafa Mokhtar <mmokh...@cloudera.com>
Subject Re: Does Impala supports or plan to support Late Materialization
Date Tue, 20 Mar 2018 22:34:50 GMT
@Antoni,

Check the blog below, it has examples on how to optimize the schema for
selective queries.

https://blog.cloudera.com/blog/2017/12/faster-performance-for-selective-queries/


On Tue, Mar 20, 2018 at 3:30 PM, Tim Armstrong <tarmstrong@cloudera.com>
wrote:

> The page indices should solve a large part of this problem, but I can
> definitely come up with examples where the page indices aren't sufficient
> to avoid most materialisation if we have a predicate on an unsorted column.
>
> E.g. if you have a predicate on a state column with 50 distinct values
> (I'm being US-centric).
>
>   select * from sales where state = 'MI'
>
> Suppose there is some amount of locality to the data and on average you
> get 2 states per data page. You're probably only going to be able to filter
> out ~50% of pages using min-max filters since 'MI' will lie in-between many
> pairs of states. Whereas if you scanned the 'state' column and materialized
> the other columns lazily, you could filter out a large majority of the data
> before materialising the other columns.
>
> On Tue, Mar 20, 2018 at 9:20 AM, Alexander Behm <alex.behm@cloudera.com>
> wrote:
>
>> I think we do eventually want to support it. For highly selective queries
>> the existing dictionary and min/max filtering can already be very
>> effective. In addition, we plan to add indexes for finer-grained page
>> pruning. See https://issues.apache.org/jira/browse/IMPALA-5842
>>
>> After all those improvements, it's not clear what the additional benefit
>> of later materialization is going to be in practice.
>>
>> Do you have a case in mind that specifically requires late
>> materialization to work well?
>>
>> On Tue, Mar 20, 2018 at 12:47 AM, Antoni Ivanov <aivanov@vmware.com>
>> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> You can ignore my question, Found the relevant JIRA -
>>> https://issues.apache.org/jira/browse/IMPALA-2017 So I guess the answer
>>> is not yet.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Antoni
>>>
>>>
>>>
>>> *From:* Antoni Ivanov
>>> *Sent:* Tuesday, March 20, 2018 9:45 AM
>>> *To:* 'user@impala.apache.org' <user@impala.apache.org>
>>> *Subject:* Does Impala supports or plan to support Late Materialization
>>>
>>>
>>>
>>> I don’t mean partition pruning but as described in
>>>
>>> https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-re
>>> dshift-introduces-late-materialization-for-faster-query-processing/
>>>
>>>
>>>
>>> It basically pre-fetches first the filter columns and then after
>>> applying the filter it fetches only the data from the rest of columns only
>>> if filter applies.
>>>
>>>
>>>
>>> Thanks
>>>
>>
>>
>

Mime
View raw message