spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhinav Mishra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10112) ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another
Date Wed, 19 Aug 2015 14:22:45 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703084#comment-14703084
] 

Abhinav Mishra commented on SPARK-10112:
----------------------------------------

I am doing an assert on rdd1.getNumPartitions() and rdd2.getNumPartitions() and that comes
to be true. I have added the assert part to my issue description above along with the description
of what structure rdd1 and rdd2 have

> ValueError: Can only zip with RDD which has the same number of partitions on one machine
but not on another
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10112
>                 URL: https://issues.apache.org/jira/browse/SPARK-10112
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>         Environment: Ubuntu 14.04.2 LTS
>            Reporter: Abhinav Mishra
>
> I have this piece of code which works fine on one machine but when I run this on another
machine I get error as - "ValueError: Can only zip with RDD which has the same number of partitions".
My code is:
> rdd2 = sc.parallelize(list1) 
> rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4))
> list = rdd3.collect()
> assert rdd1. getNumPartitions() == rdd2. getNumPartitions()
> My rdd1 has this structure - [(1,2,3),(4,5,6)....]. My rdd2 has this structure - [1,2,3....]
>  
> Both my rdd's - rdd1 and rdd2, have same number of elements and same number of partition
(both have 1 partition) and I tried to use repartition() as well but it does not resolves
this issue.
> The above code works fine on one machine but throws error on another. I tired to look
for some explanations but I couldn't find any specific reason for this behavior. I have spark
1.3 on the machine on which it runs without any error and spark 1.4 on machine on which this
error comes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message