spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin (Sangwoo) Kim (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-1112) When spark.akka.frameSize > 10, task results bigger than 10MiB block execution
Date Thu, 29 May 2014 02:52:01 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011968#comment-14011968
] 

Kevin (Sangwoo) Kim edited comment on SPARK-1112 at 5/29/14 2:50 AM:
---------------------------------------------------------------------

Hi all, 

I'm very new to Spark and doing some tests, I've experienced similar issue.
(tested with Spark Shell, 0.9.1, r3.8xlarge instance on EC2 - 32 core / 244GiB MEM)

I was trying to broadcast 700MB of data and Spark hangs when I run collect() method for the
data. 

Here's the strange things :
1) when I tried 
{code}val userInfo = sc.textFile("file:///spark/logs/user_sign_up2.csv").map{line => val
split = line.split(","); (split(1), split)}
val userInfoMap = userInfo.collectAsMap
{code}
it runs well.
2) when I tried 
{code}val userInfo = sc.textFile("file:///spark/logs/user_sign_up2.csv").map{line => val
split = line.split(","); (split(1), split(5))} 
val userInfoMap = userInfo.collectAsMap
{code}
Spark hangs.
3) when I slightly control the data size using sample() method or cutting the data file, it
runs well. 

Our team investigated logs from master and worker then we found worker finished all tasks
but master couldn't retrieve the result from a task the result size larger than 10MB

We tried to apply the workaround setting spark.akka.frameSize to 9, it works like a charm.

I guess it might hard to reproduce the issue, please contact me if there's need of testing
or getting logs. 

Thanks!


was (Author: swkimme):
Hi all, 

I'm very new to Spark and doing some tests, I've experienced similar issue.
(tested with Spark Shell, 0.9.1, r3.8xlarge instance on EC2 - 32 core / 244GiB MEM)

I was trying to broadcast 700MB of data and Spark hangs when I run collect() method for the
data. 

Here's the strange things :
1) when I tried 
{code}val userInfo = sc.textFile("file:///spark/logs/user_sign_up2.csv").map{line => val
split = line.split(","); (split(1), split)}{code}
it runs well.
2) when I tried 
{code}val userInfo = sc.textFile("file:///spark/logs/user_sign_up2.csv").map{line => val
split = line.split(","); (split(1), split(5))} {code}
Spark hangs.
3) when I slightly control the data size using sample() method or cutting the data file, it
runs well. 

Our team investigated logs from master and worker then we found worker finished all tasks
but master couldn't retrieve the result from a task the result size larger than 10MB

We tried to apply the workaround setting spark.akka.frameSize to 9, it works like a charm.

I guess it might hard to reproduce the issue, please contact me if there's need of testing
or getting logs. 

Thanks!

> When spark.akka.frameSize > 10, task results bigger than 10MiB block execution
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-1112
>                 URL: https://issues.apache.org/jira/browse/SPARK-1112
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.0
>            Reporter: Guillaume Pitel
>            Priority: Critical
>             Fix For: 0.9.2
>
>
> When I set the spark.akka.frameSize to something over 10, the messages sent from the
executors to the driver completely block the execution if the message is bigger than 10MiB
and smaller than the frameSize (if it's above the frameSize, it's ok)
> Workaround is to set the spark.akka.frameSize to 10. In this case, since 0.8.1, the blockManager
deal with  the data to be sent. It seems slower than akka direct message though.
> The configuration seems to be correctly read (see actorSystemConfig.txt), so I don't
see where the 10MiB could come from 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message