spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Herman van Hovell (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-16984) executeTake tries all partitions if first parition is empty
Date Fri, 02 Sep 2016 15:16:20 GMT

     [ https://issues.apache.org/jira/browse/SPARK-16984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Herman van Hovell resolved SPARK-16984.
---------------------------------------
       Resolution: Fixed
         Assignee: Robert Kruszewski
    Fix Version/s: 2.1.0

> executeTake tries all partitions if first parition is empty
> -----------------------------------------------------------
>
>                 Key: SPARK-16984
>                 URL: https://issues.apache.org/jira/browse/SPARK-16984
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Robert Kruszewski
>            Assignee: Robert Kruszewski
>             Fix For: 2.1.0
>
>
> in executeTake if the number of rows returned by first partition is 0 we try all partitions
next time. This can lead to pathological cases where your first partition is empty and rest
have data. This unfortunately can happen with skewed data. Empirically observed it's better
to make few roundtrips instead of potentially killing driver with big collect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message