spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-2696) Reduce default spark.serializer.objectStreamReset
Date Sat, 26 Jul 2014 08:06:38 GMT

     [ https://issues.apache.org/jira/browse/SPARK-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matei Zaharia updated SPARK-2696:
---------------------------------

    Assignee: Hossein Falaki

> Reduce default spark.serializer.objectStreamReset 
> --------------------------------------------------
>
>                 Key: SPARK-2696
>                 URL: https://issues.apache.org/jira/browse/SPARK-2696
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Hossein Falaki
>            Assignee: Hossein Falaki
>              Labels: configuration
>             Fix For: 1.1.0, 1.0.3
>
>
> The current default value of spark.serializer.objectStreamReset is 10,000. 
> When trying to re-partition (e.g., to 64 partitions) a large file (e.g., 500MB), containing
1MB records, the serializer will cache 10000 x 1MB x 64 = 640 GB which will cause it to go
out of memory.
> We think 100 would be a more reasonable default value for this configuration parameter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message