hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-25) a new map/reduce example and moving the examples from src/java to src/examples
Date Tue, 07 Feb 2006 17:21:00 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-25?page=comments#action_12365449 ] 

Doug Cutting commented on HADOOP-25:

I don't feel that strongly in particular about "ant compile" compiling the examples.  It's
more the principle of keeping that command, the default, minimal.  When the next person comes
along and adds something optional that compiles to build.xml I don't want them to also add
it to "ant compile".

On the other hand, "ant test" should be maximal, compiling and testing as much as possible.
 My rule is that "ant clean test" should be run before every commit.

I like the idea of making bin/hadoop easily extensible with something like 'bin/hadoop run
build/hadoop-examples.jar'.  We could by convention create executable jars whose default main()
listed the commands that the jar supports.  We could even change hadoop.jar to be like this,
moving the command selection logic out of the shell script and into a Java class.  +1

I really think it would be nice if for common MapReduce operations (e.g., sorting, inverting,
etc.) on a well configured cluster one does not have to specify number of map tasks or reduce
tasks.  That way one can run something on one cluster with 20 single-processor machines, and
then turn and run it on another with 200 dual-processor machines with the system doing a reasonable
job.  One should also be able to fine-tune things for a particular cluster if one likes, but
that should be optional.  There are cases where the precise number of outputs is critical,
but there are (in my experience) many more where the precise number of outputs does not matter.

One way to sidestep this might be to add a standard  '-D' option to bin/hadoop that permits
one to specify any configuration option.  That way one could, e.g., always easily set the
number of map or reduce tasks for each job, but also not be forced to.

And you're right: a JVM probably isn't required to print that stack, but I'm strongly in favor
of things that make code smaller (easier to read, easier to maintain), especially example

> a new map/reduce example and moving the examples from src/java to src/examples
> ------------------------------------------------------------------------------
>          Key: HADOOP-25
>          URL: http://issues.apache.org/jira/browse/HADOOP-25
>      Project: Hadoop
>         Type: Improvement
>   Components: mapred
>     Reporter: Owen O'Malley
>     Priority: Minor
>  Attachments: examples.patch
> The new example is the word count example from Google's paper. I moved the examples into
a separate jar file to demonstrate how to run stand-alone application code.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message