crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-449) Add sequentialDo function for injecting arbitrary non-parallel code
Date Fri, 01 Aug 2014 19:37:39 GMT


Gabriel Reid commented on CRUNCH-449:

Looks good, all the added javadoc makes it way more clear how things work.

I was playing around with this a bit, and found that setting a dependency on an input collection
doesn't seem to work (because it tries to cast it to a SourceTarget in PipelineCallable.dependsOn.
The actual stack trace I got was
java.lang.ClassCastException: cannot be cast to org.apache.crunch.SourceTarget
	at org.apache.crunch.PipelineCallable.dependsOn(
	at org.apache.crunch.PipelineCallableIT.testWithTargetDependencies(

I guess this is kind of a weird case (depending on an input collection), but it would be good
to deal with it some how instead of the ClassCastException.

Another thing I noticed is that there is no default name or message on PipelineCallable, so
when I returned {{Status.FAILURE}} from the {{call()}} method, the following was logged:
1 callable failure(s) occurred:
null: null

Maybe using the toString() of the PipelineCallable as the default name would be good, and
a message like "No message available, please implement PipelineCallable.getMessage()" could
be used as the default message.

Apart from those couple of things, this is good to go as far as I'm concerned.

> Add sequentialDo function for injecting arbitrary non-parallel code
> -------------------------------------------------------------------
>                 Key: CRUNCH-449
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-449.patch, CRUNCH-449b.patch, CRUNCH-449c.patch, CRUNCH-449d.patch
> I've been noodling on this one for awhile: how to add the ability to execute some code
if and only if one or more targets are created, and have that executed code (optionally) return
one or more new PCollections as a result. I was thinking that this functionality could be
wired in to libraries to do things like bulk loading HBase tables or running Sqoop jobs as
part of Crunch pipelines automatically.

This message was sent by Atlassian JIRA

View raw message