crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Durfey <>
Subject Re: Cleaning up after exceptions
Date Tue, 01 Apr 2014 19:02:19 GMT
Thanks. I’ll look into that. Also, I just noticed that as of 0.8.2, crunch has a public cleanup()
on the Pipeline interface. I should be able to use that, as my code was just updated to that

On Mar 29, 2014, at 5:58 AM, Josh Wills <> wrote:

> IIRC (I'm away from my computer) we added the ability to add arbitrary hooks that would
always be executed at the end of a pipeline run to the PipelineExecution interface-- the one
that is returned by runAsync), which could be used to ensure that the temp directories were
cleaned up no matter what happened on the run. Does that work for this problem?
> On Fri, Mar 28, 2014 at 10:05 AM, Stephen Durfey <> wrote:
> If I have a scenario where I have already called Pipeline#run (and some temporary directories
were created by Crunch during the run), and have continued on to do some additional processing
(created some new PCollection’s and specified a write location), and an exception occurs
in my code, outside of the pipeline, before Pipeline#run is called again, I would need a way
to ensure the temporary directories created in my initial run are always cleaned up. I could
call Pipeline#done, which calls cleanup() in MRPipeline, but it also calls run(). However,
I would prefer not to have run() called at all, due to the exception thrown in my code.
> Would it be possible to make cleanup() public in the Pipeline interface so that can be
used to clean up any temp directories created by the pipeline?

View raw message