spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davies Liu (JIRA)" <>
Subject [jira] [Commented] (SPARK-18676) Spark 2.x query plan data size estimation can crash join queries versus 1.x
Date Tue, 06 Dec 2016 18:08:58 GMT


Davies Liu commented on SPARK-18676:

What's the schema and plan of the child looks like? It's possible that the schema of parquet
file is wide, also highly compressed, only a few column are used in the query, then the estimation
will be much smaller than actual data size. The estimation of string could also be wrong.

Using the bytes of parquet file as the metric for broadcasting is bad, we also saw some cases
that the parquet file is only a few MB, but the broadcast is a few GB.

The estimation could easily be wrong for many reasons, maybe we could switch to ShuffleJoin
when it realize that the actual data is larger than thought, will that work?

> Spark 2.x query plan data size estimation can crash join queries versus 1.x
> ---------------------------------------------------------------------------
>                 Key: SPARK-18676
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>            Reporter: Michael Allman
> Commit [c481bdf|] significantly modified
the way Spark SQL estimates the output data size of query plans. I've found that—with the
new table query partition pruning support in 2.1—this has lead to in some cases underestimation
of join plan child size statistics to a degree that makes executing such queries impossible
without disabling automatic broadcast conversion.
> In one case we debugged, the query planner had estimated the size of a join child to
be 3,854 bytes. In the execution of this child query, Spark reads 20 million rows in 1 GB
of data from parquet files and shuffles 722.9 MB of data, outputting 17 million rows. In planning
the original join query, Spark converts the child to a {{BroadcastExchange}}. This query execution
fails unless automatic broadcast conversion is disabled.
> This particular query is complex and very specific to our data and schema. I have not
yet developed a reproducible test case that can be shared. I realize this ticket does not
give the Spark team a lot to work with to reproduce and test this issue, but I'm available
to help. At the moment I can suggest running a join where one side is an aggregation selecting
a few fields over a large table with a wide schema including many string columns.
> This issue exists in Spark 2.0, but we never encountered it because in that version it
only manifests itself for partitioned relations read from the filesystem, and we rarely use
this feature. We've encountered this issue in 2.1 because 2.1 does partition pruning for metastore
tables now.
> As a back stop, we've patched our branch of Spark 2.1 to revert the reductions in default
data type size for string, binary and user-defined types. We also removed the override of
the statistics method in `LogicalPlan` which reduces the output size of a plan based on the
ratio of that plan's output schema size versus its children's. We have not had this problem

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message