spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liupengcheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23124) Warn users when broacast big table in JoinSelection instead of just run it
Date Wed, 17 Jan 2018 06:56:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328358#comment-16328358
] 

liupengcheng commented on SPARK-23124:
--------------------------------------

I think we should give some warning or exception to the users if no broastHint exists and
the sizeInBytes of any child LogicalPlan of join is larger then autoBroadcastThreshold.

so the users can know it's a data amount problem

> Warn users when broacast big table in JoinSelection instead of just run it
> --------------------------------------------------------------------------
>
>                 Key: SPARK-23124
>                 URL: https://issues.apache.org/jira/browse/SPARK-23124
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.0, 2.3.0
>            Reporter: liupengcheng
>            Priority: Major
>
> When running a SparkSQL thritserver, we encountered sudden corruption of the thritserver
which is caused by OutOfMemoryError.
> After review the code and some debug, I finally find out that the framework permit broadcast
big table and give no warnings, detail code see below:
> {code:java}
> case logical.Join(left, right, joinType, condition) =>
>   val buildSide = broadcastSide(canBuildLeft = true, canBuildRight = true, left, right)
>   // This join could be very slow or OOM
>   joins.BroadcastNestedLoopJoinExec(
>     planLater(left), planLater(right), buildSide, joinType, condition) :: Nil
> private def broadcastSide(
>     canBuildLeft: Boolean,
>     canBuildRight: Boolean,
>     left: LogicalPlan,
>     right: LogicalPlan): BuildSide = {
>   def smallerSide =
>     if (right.stats.sizeInBytes <= left.stats.sizeInBytes) BuildRight else BuildLeft
>   val buildRight = canBuildRight && right.stats.hints.broadcast
>   val buildLeft = canBuildLeft && left.stats.hints.broadcast
>   if (buildRight && buildLeft) {
>     // Broadcast smaller side base on its estimated physical size
>     // if both sides have broadcast hint
>     smallerSide
>   } else if (buildRight) {
>     BuildRight
>   } else if (buildLeft) {
>     BuildLeft
>   } else if (canBuildRight && canBuildLeft) {
>     // for the last default broadcast nested loop join
>     smallerSide
>   } else {
>     throw new AnalysisException("Can not decide which side to broadcast for this join")
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message