spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted
Date Sat, 28 Jun 2014 18:22:24 GMT

    [ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046937#comment-14046937
] 

Patrick Wendell commented on SPARK-2228:
----------------------------------------

I'm not sure it's a good idea to expose this to users. The current size of the buffer is a
very large number of events: 10,000. If a user overflows this buffer it means that the program
is not in steady state... it will send the wrong message to users allowing them to size this
buffer.

The real issue here is that there is an efficiency problem in the listener that receives these
events. In the case where blocks are being dropped or persisted, we do some operations that
are linear in the number of total blocks ever persisted. This is an efficiency problem with
the current implementation of the storage status listener when blocks are being persisted
and dropped frequently.

> onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2228
>                 URL: https://issues.apache.org/jira/browse/SPARK-2228
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Baoxu Shi
>
> We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during iterative
computing, but after several hundreds of iterations, there will be `NoSuchElementsError`.
We check the code and locate the problem at `org.apache.spark.ui.jobs.JobProgressListener`.
When `onStageCompleted` is called, such `stageId` can not be found in `stageIdToPool`, but
it does exist in other HashMaps. So we think `onStageSubmitted` is not properly called. `Spark`
did add a stage but failed to send the message to listeners. When sending `finish` message
to listeners, the error occurs. 
> This problem will cause a huge number of `active stages` showing in `SparkUI`, which
is really annoying. But it may not affect the final result, according to the result of my
testing code.
> I'm willing to help solve this problem, any idea about which part should I change? I
assume `org.apache.spark.scheduler.SparkListenerBus` have something to do with it but it looks
fine to me.
> FYI, here is the test code that could reproduce the problem. I do not know who to put
code here with highlight, so I put the code on gist to make the issue looks clean.
> https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message