crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-405) Explore adding support for idempotent MRPipeline.plan()
Date Wed, 20 Aug 2014 02:26:18 GMT


Micah Whitacre updated CRUNCH-405:

    Attachment: CRUNCH-405d.patch

Here is a patch that adds javadoc to the two methods.

Allan, the "hand waving" you mention was something similar to what I was envisioning but it
is definitely a more sweeping change to how Crunch build graphs and tracks states.  Specifically
right now Crunch tracks a lot of information on individual PCollections vs just in the generated
plan.  So it'd be nice to separate those concerns but would delay the idempotent functionality

> Explore adding support for idempotent MRPipeline.plan()
> -------------------------------------------------------
>                 Key: CRUNCH-405
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-405.patch, CRUNCH-405_v1.patch, CRUNCH-405b.patch, CRUNCH-405c.patch,
> Talking through a use case with a consumer, they were interested in having the ability
to run the MRPipeline.plan() method one to many times prior to ever calling the
methods.  The reason for this was they were looking at pulling information off the MRExecutor
to tweak settings inside of their DoFns.
> Currently the MRPipeline implementation however does not have an idempotent plan() method
as it alters the state of internal values therefore affecting the full run once done() is
> It would be nice if we added an idempotent plan() method that could be gather this information
or perhaps a reset option.  

This message was sent by Atlassian JIRA

View raw message