incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brock Noland (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-68) Crunch examples don't accept generic tool arguments
Date Thu, 20 Sep 2012 15:31:08 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459672#comment-13459672
] 

Brock Noland commented on CRUNCH-68:
------------------------------------

Alright, here is what I have uncovered:

1) The reason that the main and run methods are getting the classname is because the jar manifest
has the classname already specified:

{noformat}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar not.a.class.name wordcount/input
wordcount/output-2
12/09/20 10:21:44 INFO exec.CrunchJob: Running job "org.apache.crunch.examples.WordCount:
Text(wordcount/input)+S0+Aggregate.count+GBK+combine+asText+Text(wordcount/output-2)"
{noformat}

Note that not.a.class.name is only required because the run() method is looking for 3 args.

2) Due to #1, it's actually not possible to run the other examples:

{noformat}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar org.apache.crunch.examples.TotalBytesByIP
access_log/input access_log/output
12/09/20 10:20:14 INFO exec.CrunchJob: Running job "org.apache.crunch.examples.WordCount:
Text(access_log/input)+S0+Aggregate.count+GBK+combine+asText+Text(access_log/output)"
{noformat}

3) All examples use ToolRunner which in both 1.X and 2.X already parse the args with GenericOptionsParser
and pass the remaining args to the run() method:

https://github.com/apache/hadoop-common/blob/release-1.0.3/src/core/org/apache/hadoop/util/ToolRunner.java#L59
https://github.com/apache/hadoop-common/blob/release-2.0.1-alpha/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java#L64


Points of action:
1) Either a jar should be generated for all examples or we should remove the mainClass from
the jar manifest.
2) All examples should take 2 args. The class is specified either in the jar manifest or on
the command line and will never be passed to the run() method unless you have it both in the
manifest and on the command line.
3) The examples should not use GenericOptionsParser in the run() method.

Let me know if you agree and I can open JIRAs for said items.
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options]
input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you
can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of
a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars
makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message