spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Cheung (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-20007) Make SparkR apply() functions robust to workers that return empty data.frame
Date Sat, 18 Mar 2017 18:10:42 GMT

    [ https://issues.apache.org/jira/browse/SPARK-20007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931310#comment-15931310
] 

Felix Cheung commented on SPARK-20007:
--------------------------------------

+1 - also I've been meaning to add checks for data type mismatch as well. When schema is specified
but it doesn't match the returned data.frame the error is very hard to track down

> Make SparkR apply() functions robust to workers that return empty data.frame
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-20007
>                 URL: https://issues.apache.org/jira/browse/SPARK-20007
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.2.0
>            Reporter: Hossein Falaki
>
> When using {{gapply()}} (or other members of {{apply()}} family) with a schema, Spark
will try to parse data returned form the R process on each worker as Spark DataFrame Rows
based on the schema. In this case our provided schema suggests that we have six column. When
an R worker returns results to JVM, SparkSQL will try to access its columns one by one and
cast them to proper types. If R worker returns nothing, JVM will throw {{ArrayIndexOutOfBoundsException}}
exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message