spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JoshRosen <>
Subject [GitHub] spark pull request: SPARK-3926 [CORE] Result of JavaRDD.collectAsM...
Date Sat, 18 Oct 2014 19:23:27 GMT
Github user JoshRosen commented on the pull request:
    I can confirm that this seems to have fixed the serialization issue; here's my test-case:
    val pairs = sc.parallelize(1 to 10).map(x => (x, x))
    val map = new JavaPairRDD(pairs).collectAsMap()
    def ser(a: AnyRef) =
    It looks like there's one more case in `sql/core/src/main/scala/org/apache/spark/sql/api/java/Row.scala`
that needs to be addressed:
 This is a private method, but its return value flows to user-code.  I'll fix this up myself
on merge.
    There still might be some other corner-cases with serializability of results that we haven't
tested yet.  The result of `collect()` is serializable, so perhaps this issue only affected
our use of MapWrapper.  Long term, it would be great to add a fuzz-test that runs random Java
API workloads and attempts to serialize their results.
    I mentioned this over on JIRA, but for GitHub readers: I've opened an issue to fix this
upstream in Scala:
    I'll merge this now with my fixup.  Thanks!

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message