hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "ShellScriptProgrammingGuide" by SomeOtherAccount
Date Tue, 08 Jul 2014 18:15:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "ShellScriptProgrammingGuide" page has been changed by SomeOtherAccount:
https://wiki.apache.org/hadoop/ShellScriptProgrammingGuide?action=diff&rev1=2&rev2=3

  
   3. `HADOOP_NEW_CONFIG=true`.  This tells the rest of the system that the code being executed
is aware that it is using the new shell API and it will call the routines it needs to call
on its own.  If this isn't set, then several default actions that were done in Hadoop 2.x
and earlier are executed and several key parts of the functionality are lost.
  
-  4. `$HADOOP_LIBEXEC_DIR/___-config.sh` is executed, where ___ is the subproject.  HDFS
scripts should call `hdfs-config.sh`. MAPRED scripts should call `mapred-config.sh` YARN scripts
should call `yarn-config.sh`.  Everything else should call `hadoop-config.sh`. This does a
lot of standard initialization, processes standard options, etc. This is also what provides
override capabilities for subproject specific environment variables. For example, the system
will normally ignore `yarn-env.sh`, but `yarn-config.sh` will activate those settings.
+  4. `$HADOOP_LIBEXEC_DIR/abc-config.sh` is executed, where abc is the subproject.  HDFS
scripts should call `hdfs-config.sh`. MAPRED scripts should call `mapred-config.sh` YARN scripts
should call `yarn-config.sh`.  Everything else should call `hadoop-config.sh`. This does a
lot of standard initialization, processes standard options, etc. This is also what provides
override capabilities for subproject specific environment variables. For example, the system
will normally ignore `yarn-env.sh`, but `yarn-config.sh` will activate those settings.
  
   5. At this point, this is where the majority of your code goes.  Programs should process
the rest of the arguments and doing whatever their script is supposed to do.
  
   6. Before executing a Java program or giving user output, call `hadoop_finalize`.  This
finishes up the configuration details: adds the user class path, fixes up any missing Java
properties, configures library paths, etc.
  
   7. Either an `exit` or an `exec`.  This should return 0 for success and 1 or higher for
failure.
- 
- = Things to Avoid =
- 
-  * Adding more globals or project specific globals and/or entries in *-env.sh.  In a lot
of cases, there is pre-existing functionality that already does what you might need to do.
 Additionally, every configuration option makes it that much harder for end users.
- 
-  * Mutli-level `if`'s where the comparisons are static strings.  Use case statements.
- 
-  * BSDisms, GNUisms, or SysVisms. Double check your esoteric command and parameters on multiple
operating systems.  Usually a quick Google search will pull up man pages
- 
- = Other Best Practices =
- 
-  * If you need a new global variable for additional functionality, start it with HADOOP_
for common, HDFS_ for HDFS, YARN_ for YARN, and MAPRED_ for MapReduce.  It should be documented
in either *-env.sh or hadoop-functions.sh if it isn't meant to be touched by users. This helps
prevents our variables from clobbering other people.
- 
-  * If you need to add a new function, it should start with hadoop_ and declare any local
variables with the `local` tag.  This allows for 3rd parties to use our function libraries
without worry about conflicts.
  
  = Adding a Subcommand to an Existing Script =
  
@@ -44, +30 @@

  
   2. Add an additional entry in the case conditional. Depending upon what is being added,
several things may need to be done:
    a. Set the `CLASS` to the Java method.
+ 
    b. Add $HADOOP_CLIENT_OPTS to $HADOOP_OPTS (or, for YARN apps, $YARN_CLIENT_OPTS to $YARN_OPTS)
if this is an interactive application or for some other reason should have the user client
settings applied.
+ 
    c. For methods that can also be daemons, set `daemon=true`.  This will allow for the `--daemon`
option to work.
+ 
    d. For HDFS daemons, if it supports security, set `secure_service=true` and `secure_user`
equal to the user that should run the daemon.
  
+ = Better Practices =
+ 
+  * Avoid adding more globals or project specific globals and/or entries in *-env.sh.  In
a lot of cases, there is pre-existing functionality that already does what you might need
to do.  Additionally, every configuration option makes it that much harder for end users.
If you do need to add a new global variable for additional functionality, start it with HADOOP_
for common, HDFS_ for HDFS, YARN_ for YARN, and MAPRED_ for MapReduce.  It should be documented
in either *-env.sh (for user overridable parts) or hadoop-functions.sh (for internal-only
globals). This helps prevents our variables from clobbering other people.
+ 
+  * Remember that abc_xyz_OPTS can and should act as a catch-all for Java daemon options.
 Custom heap environment variables add unnecessary complexity for both the user and us.
+ 
+  * Avoid mutli-level `if`'s where the comparisons are static strings.  Use case statements
instead, as they are easier to read.
+ 
+  * BSDisms, GNUisms, or SysVisms. Double check your esoteric command and parameters on multiple
operating systems.  (Usually a quick Google search will pull up man pages for other OSes.)
+ 
+  * Output to the screen, especially for daemons, should be avoided.  No one wants to see
a multitude of messages during startup.  Errors should go to STDERR instead of STDOUT. Use
the `hadoop_error` function to make it clear in the code.
+ 
+  * If you need to add a new function, it should start with hadoop_ and declare any local
variables with the `local` tag.  This allows for 3rd parties to use our function libraries
without worry about conflicts.
+ 

Mime
View raw message