spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-17283) Cancel job in RDD.take() as soon as enough output is receieved
Date Fri, 02 Sep 2016 19:50:20 GMT

     [ https://issues.apache.org/jira/browse/SPARK-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Rosen resolved SPARK-17283.
--------------------------------
    Resolution: Later

Closing as "Later" for now, since a simpler approach might yield similar gains.

> Cancel job in RDD.take() as soon as enough output is receieved
> --------------------------------------------------------------
>
>                 Key: SPARK-17283
>                 URL: https://issues.apache.org/jira/browse/SPARK-17283
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>
> The current implementation of RDD.take() waits until all partitions of each job have
been computed before checking whether enough rows have been received. If take() were to perform
this check on-the-fly as individual partitions were completed then it could stop early, offering
large speedups for certain interactive queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message