crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-506) Default To.textFile to use TextFileSourceTarget
Date Thu, 02 Apr 2015 01:36:53 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391922#comment-14391922
] 

Micah Whitacre commented on CRUNCH-506:
---------------------------------------

Well I'd need to actually add a method:

To.textFile(path, ptype)

Since SourceTargets generally need the PType for how to read the data out.  So not quite a
drop in replacement and I'm not sure I can generate a PType.

> Default To.textFile to use TextFileSourceTarget
> -----------------------------------------------
>
>                 Key: CRUNCH-506
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-506
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.11.0
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>
> Had a consumer with an interesting situation.  They had code like the following:
> {code}
> PCollection<String> output = ...
> output.write(To.textFile(path));
> pipeline.done();
> long size = output.length().getValue();
> {code}
> This code was actually failing with an exception like the following:
> {noformat}
> Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main()
threw exception, org.apache.crunch.CrunchRuntimeException: java.io.IOException: No files found
to materialize at: /tmp/crunch-107739816/p8
>   org.apache.oozie.action.hadoop.JavaMainException: org.apache.crunch.CrunchRuntimeException:
java.io.IOException: No files found to materialize at: /tmp/crunch-107739816/p8
>   at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:58)
>   at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
> {noformat}
> I believe this is because the To.textFile(...) uses just TextFileTarget.  So the length()
call is going back to the intermediate state that got cleaned up by the done() call.  Switching
the To.textFile(..) to TextFileSourceTarget instead actually lets the code succeed.  
> Seems like we could switch the To.textFile(..) to use the SourceTarget impl to make this
less surprising/confusing to consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message