crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allan Shoup (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-405) Explore adding support for idempotent MRPipeline.plan()
Date Wed, 13 Aug 2014 17:38:13 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095773#comment-14095773
] 

Allan Shoup commented on CRUNCH-405:
------------------------------------

I'm just starting to look into these parts of the code, so forgive my naiveté. It is not
intuitive to me what the dryRun mode does or why it is needed.

Without knowing the how the current code might make this difficult, here's a stab at what
might be a more intuitive structure. The plan method would return a Plan object, which would
be tied to the state of the system when the plan was generated. You would then pass a plan
object to the executor, which would then execute the plan (and manipulate any system state
needed). If a plan was generated and before that plan was executed the system state was modified
(via some parallelDo or write), the system state would be updated and the previously generated
plan would no longer be executable.

So, given that there was a lot of hand-waving there, and the current setup will probably not
be amenable to something like that, perhaps some javadoc would help clarify how the system
is expected to function.

> Explore adding support for idempotent MRPipeline.plan()
> -------------------------------------------------------
>
>                 Key: CRUNCH-405
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-405
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-405.patch, CRUNCH-405_v1.patch, CRUNCH-405b.patch, CRUNCH-405c.patch
>
>
> Talking through a use case with a consumer, they were interested in having the ability
to run the MRPipeline.plan() method one to many times prior to ever calling the Pipeline.run/done
methods.  The reason for this was they were looking at pulling information off the MRExecutor
to tweak settings inside of their DoFns.
> Currently the MRPipeline implementation however does not have an idempotent plan() method
as it alters the state of internal values therefore affecting the full run once done() is
called.  
> It would be nice if we added an idempotent plan() method that could be gather this information
or perhaps a reset option.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message