hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Reed (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-435) Encapsulating startup scripts and jars in a single Jar file.
Date Fri, 11 Aug 2006 20:31:14 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-435?page=comments#action_12427618 ] 
            
Benjamin Reed commented on HADOOP-435:
--------------------------------------

I've attached the scripts. They are quite convenient to use. To start the cluster I simply
do "for i in `cat machines'; do ssh $i:hadoop/start.sh hadoop/hadoop.jar; done". I stop everything
using "for i in `cat machines'; do ssh $i:hadoop/stop.sh hadoop/hadoop.jar; done"

The start script currently assumes that the jobtracker and namenode run on the same machine.
The namenode should only be started if the machine is the namenode as indicated by the configuration
just like jobtracker. There should also be a flag to indicate whether a datanode and tasktracker
should be started if the host is a jobtracker or namenode. Despite the limitation, hopefully
you get the point :)

The nice thing about using a conf dir is that you don't have to worry about complicated classpaths.
I don't usually use it since I generally just use the config in the jar file, but there are
cases where it is convenient. Fortunately, the code is structured such that the special case
code is in a single method.

I'm not familiar enough with Tool and ToolBase to know how well it fits. From the brief look
I took it seemed to address a more specific kind of application. (I wouldn't feel bad if someone
made it use Tool and ToolBase :)

> Encapsulating startup scripts and jars in a single Jar file.
> ------------------------------------------------------------
>
>                 Key: HADOOP-435
>                 URL: http://issues.apache.org/jira/browse/HADOOP-435
>             Project: Hadoop
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Benjamin Reed
>         Attachments: hadoopit.patch, start.sh, stop.sh
>
>
> Currently, hadoop is a set of scripts, configurations, and jar files. It makes it a pain
to install on compute and datanodes. It also makes it a pain to setup clients so that they
can use hadoop. Everytime things are updated the pain begins again.
> I suggest that we should be able to build a single Jar file that has a Main-Class defined
with the configuration built in so that we can distribute that one file to nodes and clients
on updates. One nice thing that I haven't done would be to make the jarfile downloadable from
the JobTracker webpage so that clients can easily submit the jobs.
> I currently use such a setup on my small cluster. To start the job tracker I used "java
-jar hadoop.jar -l /tmp/log jobtracker" to submit a job I use "java -jar hadoop.jar jar wordcount.jar".
I used the client on my linux and Mac OSX machines and I'll I need installed in java and the
hadoop.jar file.
> hadoop.jar helps with logfiles and configurations. The default of pulling the config
files from the jar file can be overridden by specifying a config directory so that you can
easily have machine specific configs and still have the same hadoop.jar on all machines.
> Here are the available commands from hadoop.jar:
> USAGE: hadoop [-l logdir] command
>   User commands:
>     dfs          run a DFS admin client
>     jar          run a JAR file
>     job          manipulate MapReduce jobs
>     fsck         run a DFS filesystem check utility
>   Runtime startup commands:
>     datanode     run a DFS datanode
>     jobtracker   run the MapReduce job Tracker node
>     namenode     run the DFS namenode (namenode -format formats the FS)
>     tasktracker  run a MapReduce task Tracker node
>   HadoopLoader commands:
>     buildJar     builds the HadoopLoader jar file
>     conf         dump hadoop configuration
> Note, I don't have the classes for hadoop streaming built into this Jar file, but if
I had that would also be an option (it checks for needed classes before displaying an option).
It makes it very easy for users that just write scripts to use hadoop straight from their
machines.
> I'm also attaching the start.sh and stop.sh scripts that I use. These are the only scripts
I use to startup the daemons. They are very simple and the start.sh script uses the config
file to figure out whether or not to start the jobtracker and the nameserver.
> The attached patch adds the HadoopIt patch, modifies the Configuration class to find
the config files correctly, and modifies the build to make a fully contained hadoop.jar. To
update the configuration in a hadoop.jar you simply use "zip hadoop.jar hadoop-site.xml".

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message