crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-506) Default To.textFile to use TextFileSourceTarget
Date Wed, 01 Apr 2015 21:25:52 GMT
Micah Whitacre created CRUNCH-506:
-------------------------------------

             Summary: Default To.textFile to use TextFileSourceTarget
                 Key: CRUNCH-506
                 URL: https://issues.apache.org/jira/browse/CRUNCH-506
             Project: Crunch
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.11.0
            Reporter: Micah Whitacre
            Assignee: Micah Whitacre


Had a consumer with an interesting situation.  They had code like the following:

{code}
PCollection<String> output = ...
output.write(To.textFile(path));
pipeline.done();

long size = output.length().getValue();
{code}

This code was actually failing with an exception like the following:

{noformat}
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw
exception, org.apache.crunch.CrunchRuntimeException: java.io.IOException: No files found to
materialize at: /tmp/crunch-107739816/p8
  org.apache.oozie.action.hadoop.JavaMainException: org.apache.crunch.CrunchRuntimeException:
java.io.IOException: No files found to materialize at: /tmp/crunch-107739816/p8
  at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:58)
  at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
{noformat}

I believe this is because the To.textFile(...) uses just TextFileTarget.  So the length()
call is going back to the intermediate state that got cleaned up by the done() call.  Switching
the To.textFile(..) to TextFileSourceTarget instead actually lets the code succeed.  

Seems like we could switch the To.textFile(..) to use the SourceTarget impl to make this less
surprising/confusing to consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message