spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Bryński (JIRA) <>
Subject [jira] [Commented] (SPARK-9850) Adaptive execution in Spark
Date Wed, 13 Jan 2016 21:10:39 GMT


Maciej Bryński commented on SPARK-9850:

I'm not sure if my issue is related to this Jira.

In 1.6.0 when using sql limit Spark do following:
- execute limit on every partition
- then take result
Is it possible to finish scanning partitions when we collect enough rows for limit ?

> Adaptive execution in Spark
> ---------------------------
>                 Key: SPARK-9850
>                 URL:
>             Project: Spark
>          Issue Type: Epic
>          Components: Spark Core, SQL
>            Reporter: Matei Zaharia
>            Assignee: Yin Huai
>         Attachments: AdaptiveExecutionInSpark.pdf
> Query planning is one of the main factors in high performance, but the current Spark
engine requires the execution DAG for a job to be set in advance. Even with cost­-based optimization,
it is hard to know the behavior of data and user-defined functions well enough to always get
great execution plans. This JIRA proposes to add adaptive query execution, so that the engine
can change the plan for each query as it sees what data earlier stages produced.
> We propose adding this to Spark SQL / DataFrames first, using a new API in the Spark
engine that lets libraries run DAGs adaptively. In future JIRAs, the functionality could be
extended to other libraries or the RDD API, but that is more difficult than adding it in SQL.
> I've attached a design doc by Yin Huai and myself explaining how it would work in more

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message