spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Woody (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-15038) Add ability to do broadcasts in SQL at execution time
Date Sat, 30 Apr 2016 15:26:12 GMT
Patrick Woody created SPARK-15038:
-------------------------------------

             Summary: Add ability to do broadcasts in SQL at execution time
                 Key: SPARK-15038
                 URL: https://issues.apache.org/jira/browse/SPARK-15038
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.6.1
            Reporter: Patrick Woody


Currently the auto broadcasting done in SparkSQL is asynchronous and done at query planning
time. If you have a large query with many broadcasts, this can end up creating a large amount
of memory pressure/possible OOMs all at once when it actually isn't necessary.

The current workaround for these types of queries is to disable broadcast joins, which can
be prohibitive performance wise. The proposal for this ticket is to allow a config point to
toggle doing these broadcasts either eagerly/asynchronously or doing the broadcasts lazily
at execution time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message