crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Cleaning up after exceptions
Date Sat, 29 Mar 2014 10:58:19 GMT
IIRC (I'm away from my computer) we added the ability to add arbitrary
hooks that would always be executed at the end of a pipeline run to the
PipelineExecution interface-- the one that is returned by runAsync), which
could be used to ensure that the temp directories were cleaned up no matter
what happened on the run. Does that work for this problem?

On Fri, Mar 28, 2014 at 10:05 AM, Stephen Durfey <> wrote:

> If I have a scenario where I have already called Pipeline#run (and some
> temporary directories were created by Crunch during the run), and have
> continued on to do some additional processing (created some new
> PCollection's and specified a write location), and an exception occurs in
> my code, outside of the pipeline, before Pipeline#run is called again, I
> would need a way to ensure the temporary directories created in my initial
> run are always cleaned up. I could call Pipeline#done, which calls
> cleanup() in MRPipeline, but it also calls run(). However, I would prefer
> not to have run() called at all, due to the exception thrown in my code.
> Would it be possible to make cleanup() public in the Pipeline interface so
> that can be used to clean up any temp directories created by the pipeline?

View raw message