crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-449) Add sequentialDo function for injecting arbitrary non-parallel code
Date Fri, 01 Aug 2014 21:23:39 GMT


Josh Wills updated CRUNCH-449:

    Attachment: CRUNCH-449e.patch

Gabriel-- many thanks for the review. I generalized the Target dependency handling of the
PipelineCallables to handle both InputCollections as well as any other dependencies that get
applied to a PCollection via a ParallelDoOptions instance and added tests for the same. I
also fixed up the message stuff for the PipelineCallable per your suggestion.

> Add sequentialDo function for injecting arbitrary non-parallel code
> -------------------------------------------------------------------
>                 Key: CRUNCH-449
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-449.patch, CRUNCH-449b.patch, CRUNCH-449c.patch, CRUNCH-449d.patch,
> I've been noodling on this one for awhile: how to add the ability to execute some code
if and only if one or more targets are created, and have that executed code (optionally) return
one or more new PCollections as a result. I was thinking that this functionality could be
wired in to libraries to do things like bulk loading HBase tables or running Sqoop jobs as
part of Crunch pipelines automatically.

This message was sent by Atlassian JIRA

View raw message