spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhinav Mishra (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-10112) ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another
Date Wed, 19 Aug 2015 14:08:45 GMT
Abhinav Mishra created SPARK-10112:
--------------------------------------

             Summary: ValueError: Can only zip with RDD which has the same number of partitions
on one machine but not on another
                 Key: SPARK-10112
                 URL: https://issues.apache.org/jira/browse/SPARK-10112
             Project: Spark
          Issue Type: Bug
          Components: PySpark
         Environment: Ubuntu 14.04.2 LTS
            Reporter: Abhinav Mishra


I have this piece of code which works fine on one machine but when I run this on another machine
I get error as - "ValueError: Can only zip with RDD which has the same number of partitions".
My code is:

rdd2 = sc.parallelize(list1) 
rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4))
list = rdd3.collect()

Both my rdd's - rdd1 and rdd2, have same number of elements and same number of partition (both
have 1 partition) and I tried to use repartition() as well but it does not resolves this issue.

The above code works fine on one machine but throws error on another. I tired to look for
some explanations but I couldn't find any specific reason for this behavior. I have spark
1.3 on the machine on which it runs without any error and spark 1.4 on machine on which this
error comes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message