impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sandeep akinapelli (JIRA)" <>
Subject [jira] [Resolved] (IMPALA-5280) Coalesce chains of OR conditions to an IN predicate.
Date Thu, 29 Jun 2017 16:17:00 GMT


sandeep akinapelli resolved IMPALA-5280.
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

commit 2e8c9657d595e510901604758a50f5a9fd722f3f
Author: sandeep akinapelli <>
Date:   Wed Jun 7 15:48:00 2017 -0700

    IMPALA-5280: Coalesce chains of OR conditions to an IN predicate
    This change introduces a new rule to merge disjunct equality
    predicates into an IN predicate. As with every rule being applied
    bottom up, the rule merges the leaf OR predicates into an in predicate
    and subsequently merges the OR predicate to the existing IN predicate
    It will also merge two compatible IN predicates into a single IN
    Patch also addresses review comments to
    normalize the binary predicates and testcases for the same.
    binary predicates of the form constant <op> non constant are normalized
    to non constant <op> constant
    Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2

> Coalesce chains of OR conditions to an IN predicate.
> ----------------------------------------------------
>                 Key: IMPALA-5280
>                 URL:
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.8.0
>            Reporter: Alexander Behm
>            Assignee: sandeep akinapelli
>              Labels: newbie, perfomance
>             Fix For: Impala 2.10.0
>         Attachments: same_query_profile_on_CDH5.12.txt
> Would be nice to implement an ExprRewriteRule that coalesces multiple compatible OR conditions
to an IN predicate, e.g.:
> {code}
> (c=1) OR (c=2) OR (c=3) OR (c=4) ...
> ->
> c IN (1, 2, 3, 4...)
> {code}
> Long chains of OR are generally unwieldy, and transforming them to IN has the following
> * IN predicates with long value lists are evaluated with an O(log n) lookup in the BE
> * It is easier to extract min/max values from an IN predicate for Parquet min/max filtering
> * The IN predicate may be faster to codegen than a deep binary tree or ORs
> Note that this new rule complements existing rules to yield interesting improvements,
> {code}
> (c1=1 AND c2='a') OR (c1=2 AND c2='a') OR (c1=3 AND c2='a')
> ->
> c2='a' AND c1 IN (1, 2, 3)
> {code}
> I've attached a relevant query profile from one of Mostafa's experiments.

This message was sent by Atlassian JIRA

View raw message