spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhinav Mishra <amis...@tidemark.com>
Subject ValueError: Can only zip with RDD which has the same number of partitions error on one machine but not on another
Date Wed, 19 Aug 2015 14:52:44 GMT
Hi,

I have this piece of code which works fine on one machine but when I run
this on another machine I get error as - "ValueError: Can only zip with RDD
which has the same number of partitions". My code is:

rdd2 = sc.parallelize(list1)
rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4))
list = rdd3.collect()
assert rdd1. getNumPartitions() == rdd2. getNumPartitions()

My rdd1 has this structure - [(1,2,3),(4,5,6)....]. My rdd2 has this
structure - [1,2,3....]

Both my rdd's - rdd1 and rdd2, have same number of elements and same number
of partition (both have 1 partition) and I tried to use repartition() as
well but it does not resolves this issue.

The above code works fine on one machine but throws error on another. I
tired to look for some explanations but I couldn't find any specific reason
for this behavior. I have spark 1.3 on the machine on which it runs without
any error and spark 1.4 on machine on which this error comes.

Regards,

*Abhinav Mishra *

Mime
View raw message