impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Behm (JIRA)" <>
Subject [jira] [Updated] (IMPALA-5095) Use parquet::Statistics for simple min/max aggregates
Date Fri, 17 Mar 2017 21:29:41 GMT


Alexander Behm updated IMPALA-5095:
    Labels: parquet perfomance ramp-up  (was: )

> Use parquet::Statistics for simple min/max aggregates
> -----------------------------------------------------
>                 Key: IMPALA-5095
>                 URL:
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.8.0
>            Reporter: Alexander Behm
>              Labels: parquet, perfomance, ramp-up
> {code}
> select min(int_col), max(bigint_col) from parquet_table;
> select min(int_col), max(bigint_col) from parquet_table group by partition_col;
> {code}
> The slot values for int_col and bigint_col can be directly filled in from the parquet::Statistics,
assuming stats are available for both columns. No columns need to be scanned/materialized.
> This JIRA focuses on implementing this optimization in the simple case where all scanned
columns feed into min/max aggregates and where all columns have parquet::Statistics. Those
conditions can be relaxed, but should be addressed separately.
> This optimization opportunity must be detected by the planner and is not applicable when
there are scan predicates.

This message was sent by Atlassian JIRA

View raw message