spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From catlain <...@git.apache.org>
Subject [GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Date Fri, 02 Jun 2017 06:05:27 GMT
Github user catlain commented on the issue:

    https://github.com/apache/spark/pull/14783
  
    still have this issue when input data is a array column with different length each vector,
like:
    
    ```
    test1
    
                   key              value
    1 4dda7d68a202e9e3              1595297780
    2  4e08f349deb7392              641991337
    3 4e105531747ee00b              374773009
    4 4f1d5ef7fdb4620a              2570136926
    5 4f63a71e6dde04cd              2117602722
    6 4fa2f96b689624fc              3489692062, 1344510747, 1095592237, 424510360, 3211239587
    
    sparkR.stop()
    sc <- sparkR.init()
    sqlContext <- sparkRSQL.init(sc)
    spark_df = createDataFrame(sqlContext, test1)
    
    # Fails
    dapplyCollect(spark_df, function(x) x)
    
    Caused by: org.apache.spark.SparkException: R computation failed with
     Error in (function (..., deparse.level = 1, make.row.names = TRUE, stringsAsFactors =
default.stringsAsFactors())  : 
      invalid list argument: all variables should have the same length
    	at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
    	at org.apache.spark.sql.execution.r.MapPartitionsRWrapper.apply(MapPartitionsRWrapper.scala:59)
    	at org.apache.spark.sql.execution.r.MapPartitionsRWrapper.apply(MapPartitionsRWrapper.scala:29)
    	at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:186)
    	at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:183)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    	at org.apache.spark.scheduler.Task.run(Task.scala:99)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	... 1 more
    
    ```
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message