spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-6288) Pyrolite calls hashCode to cache previously serialized objects
Date Fri, 13 Mar 2015 17:54:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360799#comment-14360799
] 

Josh Rosen commented on SPARK-6288:
-----------------------------------

Do we have to modify Pyrolite in order to disable the memoization?  If so, there are a few
other Pyrolite patches / fixes we might consider bundling in while publishing a new version.
 I think that our current Pyrolite version actually corresponds to some released version and
that we depend on a custom build only because Pyrolite doesn't publish their releases to Maven.

> Pyrolite calls hashCode to cache previously serialized objects
> --------------------------------------------------------------
>
>                 Key: SPARK-6288
>                 URL: https://issues.apache.org/jira/browse/SPARK-6288
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 1.0.2, 1.1.1, 1.3.0, 1.2.1
>            Reporter: Xiangrui Meng
>            Assignee: Josh Rosen
>         Attachments: Screen Shot 2015-03-13 at 10.45.35 AM.png
>
>
> https://github.com/irmen/Pyrolite/blob/v2.0/java/src/net/razorvine/pickle/Pickler.java#L140
> This operation could be quite expensive, compared to serializing the object directly,
because hashCode usually needs to access all data stored in the object. Maybe we should disable
this feature by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message