spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-26203) Benchmark performance of In and InSet expressions
Date Tue, 11 Dec 2018 19:11:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-26203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-26203:
------------------------------------

    Assignee: Apache Spark

> Benchmark performance of In and InSet expressions
> -------------------------------------------------
>
>                 Key: SPARK-26203
>                 URL: https://issues.apache.org/jira/browse/SPARK-26203
>             Project: Spark
>          Issue Type: Test
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Anton Okolnychyi
>            Assignee: Apache Spark
>            Priority: Major
>
> {{OptimizeIn}} rule replaces {{In}} with {{InSet}} if the number of possible values exceeds
"spark.sql.optimizer.inSetConversionThreshold" and all values are literals. This was done
for performance reasons to avoid O\(n\) time complexity for {{In}}.
> The original optimization was done in SPARK-3711. A lot has changed after that (e.g.,
generation of Java code to evaluate expressions), so it is worth to measure the performance
of this optimization again.
> According to my local benchmarks, {{InSet}} can be up to 10x time slower than {{In}}
due to autoboxing and other issues.
> The scope of this JIRA is to benchmark every supported data type inside {{In}} and {{InSet}}
and outline existing bottlenecks. Once we have this information, we can come up with solutions.

> Based on my preliminary investigation, we can do quite some optimizations, which quite
frequently depend on a specific data type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message