spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumona Routh <>
Subject Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext
Date Mon, 22 Feb 2016 14:51:30 GMT
Ok, I understand.

Yes, I will have to handle them in the main thread.


On Wed, Feb 17, 2016 at 12:24 PM Shixiong(Ryan) Zhu <>

> `onApplicationEnd` is posted when SparkContext is stopping, and you cannot
> submit any job to a stopping SparkContext. In general, SparkListener is
> used to monitor the job progress and collect job information, an you should
> not submit jobs there. Why not submit your jobs in the main thread?
> On Wed, Feb 17, 2016 at 7:11 AM, Sumona Routh <> wrote:
>> Can anyone provide some insight into the flow of SparkListeners,
>> specifically onApplicationEnd? I'm having issues with the SparkContext
>> being stopped before my final processing can complete.
>> Thanks!
>> Sumona
>> On Mon, Feb 15, 2016 at 8:59 AM Sumona Routh <> wrote:
>>> Hi there,
>>> I am trying to implement a listener that performs as a post-processor
>>> which stores data about what was processed or erred. With this, I use an
>>> RDD that may or may not change during the course of the application.
>>> My thought was to use onApplicationEnd and then saveToCassandra call to
>>> persist this.
>>> From what I've gathered in my experiments,  onApplicationEnd  doesn't
>>> get called until sparkContext.stop() is called. If I don't call stop in my
>>> code, the listener won't be called. This works fine on my local tests -
>>> stop gets called, the listener is called and then persisted to the db, and
>>> everything works fine. However when I run this on our server,  the code in
>>> onApplicationEnd throws the following exception:
>>> Task serialization failed: java.lang.IllegalStateException: Cannot call
>>> methods on a stopped SparkContext
>>> What's the best way to resolve this? I can think of creating a new
>>> SparkContext in the listener (I think I have to turn on allowing multiple
>>> contexts, in case I try to create one before the other one is stopped). It
>>> seems odd but might be doable. Additionally, what if I were to simply add
>>> the code into my job in some sort of procedural block: doJob,
>>> doPostProcessing, does that guarantee postProcessing will occur after the
>>> other?
>>> We are currently using Spark 1.2 standalone at the moment.
>>> Please let me know if you require more details. Thanks for the
>>> assistance!
>>> Sumona

View raw message