incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Pipeline in synchronous mode
Date Wed, 27 Jun 2012 09:40:59 GMT
On Wed, Jun 27, 2012 at 10:12 AM, Rahul <rsharma@xebia.com> wrote:
> How can I run Pipeline in synchronous mode ?  I have built junit tests but
> they all run in async mode over MRPipeline.
> I need to make sure that the pipeline gets finished so I have to continually
> monitor the output folder.  Is there a better way for this ?

Hi Rahul,

Pipelines themselves are always run synchronously; that is, a call to
Pipeline.run or Pipeline.done will block until all underlying jobs have
run (or failed).

There shouldn't be a need to monitor output folders; you can count
on output directories being populated once Pipeline.run or
Pipeline.done returns (and shouldn't count on the folders being populated
until that point.

Of course, the underlying MapReduce job(s) are run asynchronously.
The coordination of the asynchronous MR jobs and the blocking
of the MRPipeline class is handled by the CrunchJobControl class.

Hope this helps.

Regards,

Gabriel

Mime
View raw message