spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dav...@apache.org
Subject spark git commit: [SPARK-10642] [PYSPARK] Fix crash when calling rdd.lookup() on tuple keys
Date Thu, 17 Sep 2015 17:02:36 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 eae1566de -> 9f8fb3385


[SPARK-10642] [PYSPARK] Fix crash when calling rdd.lookup() on tuple keys

JIRA: https://issues.apache.org/jira/browse/SPARK-10642

When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long.
That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long
cannot be cast to java.lang.Integer`.

Author: Liang-Chi Hsieh <viirya@appier.com>

Closes #8796 from viirya/fix-pyrdd-lookup.

(cherry picked from commit 136c77d8bbf48f7c45dd7c3fbe261a0476f455fe)
Signed-off-by: Davies Liu <davies.liu@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9f8fb338
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9f8fb338
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9f8fb338

Branch: refs/heads/branch-1.5
Commit: 9f8fb3385fb14bc8b83772bf138e777beb5d7157
Parents: eae1566
Author: Liang-Chi Hsieh <viirya@appier.com>
Authored: Thu Sep 17 10:02:15 2015 -0700
Committer: Davies Liu <davies.liu@gmail.com>
Committed: Thu Sep 17 10:02:33 2015 -0700

----------------------------------------------------------------------
 python/pyspark/rdd.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/9f8fb338/python/pyspark/rdd.py
----------------------------------------------------------------------
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 9ef60a7..ab5aab1 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -84,7 +84,7 @@ def portable_hash(x):
         h ^= len(x)
         if h == -1:
             h = -2
-        return h
+        return int(h)
     return hash(x)
 
 
@@ -2192,6 +2192,9 @@ class RDD(object):
         [42]
         >>> sorted.lookup(1024)
         []
+        >>> rdd2 = sc.parallelize([(('a', 'b'), 'c')]).groupByKey()
+        >>> list(rdd2.lookup(('a', 'b'))[0])
+        ['c']
         """
         values = self.filter(lambda kv: kv[0] == key).values()
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message