hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9902) Shell script rewrite
Date Sat, 26 Jul 2014 03:16:39 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer updated HADOOP-9902:
-------------------------------------

    Release Note: 
The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some
new features.  While an eye has been kept towards compatibility, some changes may break existing
installations.

INCOMPATIBLE CHANGES:

* The pid and out files for secure daemons have been renamed to include the appropriate ${HADOOP_IDENT_STR}.
 This should allow, with proper configurations in place, for multiple versions of the same
secure daemon to run on a host.
* All Hadoop shell script subsystems now execute hadoop-env.sh, which allows for all of the
environment variables to be in one location.  This was not the case previously.
* The default content of *-env.sh has been significantly alterated, with the majority of defaults
moved into more protected areas inside the code. Additionally, these files do not auto-append
anymore; setting a variable on the command line prior to calling a shell command must contain
the entire content, not just any extra settings.  This brings Hadoop more in-line with the
vast majority of other software packages.
* All YARN_* and MAPRED_* environment variables act as overrides to their equivalent HADOOP_*
environment variables when 'yarn', 'mapred' and related commands are executed. Previously,
these were separated out which meant a significant amount of duplication of common settings.
 * hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec and sbin.
 The sbin versions have been removed.
* The log4j settings forcibly set by some *-daemon.sh commands have been removed.  These settings
are now configurable in the *-env.sh files, in particular via *_OPT. 
* Some formerly documented entries in yarn-env.sh have been undocumented as a simple form
of deprecration in order to greatly simplify configuration and reduce unnecessary duplication.
 They will still work, but those variables will likely be removed in a future release.
* Support for various undocumentented YARN log4j.properties files has been removed.
* Support for ${HADOOP_MASTER} and the related rsync code have been removed.
* The undocumented yarn.id.str has been removed.
* We now require bash v3 (released July 27, 2004) or better in order to take advantage of
better regex handling and ${BASH_SOURCE}.  POSIX sh will not work.
* Support for --script has been removed. We now use ${HADOOP_*_PATH} or ${HADOOP_PREFIX} to
find the necessary binaries.  (See other note regarding ${HADOOP_PREFIX} auto discovery.)
* Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be ignored and
stripped from their respective environment settings.

BUG FIXES:

* ${HADOOP_CONF_DIR} is now properly honored everywhere.
* ${HADOOP_CONF_DIR}/hadoop-layout.sh is now documented with a provided hadoop-layout.sh.example
file.
* Shell commands should now work properly when called as a relative path, without ${HADOOP_PREFIX}
being defined, and as the target of bash -x for debugging. If ${HADOOP_PREFIX} is not set,
it will be automatically determined based upon the current location of the shell library.
 Note that other parts of the ecosystem may require this environment variable to be configured.
* Operations which trigger ssh will now limit the number of connections to run in parallel
to ${HADOOP_SSH_PARALLEL} to prevent memory and network exhaustion.  By default, this is set
to 10.
* ${HADOOP_CLIENT_OPTS} support has been added to a few more commands.
* Some subcommands were not listed in the usage.
* Various options on hadoop command lines were supported inconsistently.  These have been
unified into hadoop-config.sh. --config still needs to come first, however.
* ulimit logging for secure daemons no longer assumes /bin/bash but does assume bash on the
command line path.
* Removed references to some Yahoo! specific paths.
* Removed unused slaves.sh from YARN build tree.
* Many exit states have been changed to reflect reality.
* Shell level errors now go to STDERR.  Before, many of them went incorrectly to STDOUT.
* 

IMPROVEMENTS:

* The style and layout of the scripts is now the same.  This includes a vast increase in code
comments.
* Significant amounts of redundant code have been moved into a new file called hadoop-functions.sh.
* The various *-env.sh have been massively changed to include documentation and examples on
what can be set, ramifications of setting, etc.  for all variables that are expected to be
set by a user.
* There is now some trivial deduplication and sanitization of the classpath and JVM options.
 This allows, amongst other things, for custom settings in *_OPTS for Hadoop daemons to override
defaults and other generic settings (i.e., ${HADOOP_OPTS}).  This is particularly relevant
for Xmx settings, as one can now set them in _OPTS and ignore the heap specific options for
daemons which force the size in megabytes.
* Operations which trigger ssh connections can now use pdsh if installed.  ${HADOOP_SSH_OPTS}
still gets applied. 
* Subcommands have been alphabetized in both usage and in the code.
* All/most of the functionality provided by the sbin/* commands has been moved to either their
bin/ equivalents or made into functions.  The rewritten versions of these commands are now
wrappers to maintain backward compatibility.
* Daemonization has been moved from *-daemon.sh to the bin commands via the --daemon option.
Simply use --daemon start to start a daemon, --daemon stop to stop a daemon, and --daemon
status to set $? to the daemon's status.  
* It is now possible to override some of the shell code capabilities to provide site specific
functionality without replacing the shipped versions. 
* A new option called --buildpaths will attempt to add developer build directories to the
classpath to allow for in source tree testing.
* If a usage function is defined, the following will trigger a help message if it is given
in the option path to the shell script: --? -? ? --help -help -h help 
* Several generic environment variables have been added to provide a common configuration
for pids, logs, and their security equivalents.  The older versions still act as overrides
to these generic versions.
* Groundwork has been laid to allow for custom secure daemon setup using something other than
jsvc.
* Added distch and jnipath subcommands to hadoop command.
* By default, ${HADOOP_CONF_DIR} is now at the end of the CLASSPATH.

  was:
The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some
new features.  While an eye has been kept towards compatibility, some changes may break existing
installations.

INCOMPATIBLE CHANGES:

* The pid, out, etc  files for secure daemons have been renamed to include the appropriate
${HADOOP_IDENT_STR}.  This should allow, with proper configurations in place, for multiple
versions of the same secure daemon to run on a host.
* All Hadoop shell script subsystems execute hadoop-env.sh, which allows for all of the environment
variables to be in one location.  This was not the case previously.
* The default content of *-env.sh has been significantly alterated, with the majority of defaults
moved into more protected areas. 
* All YARN_* and MAPRED_* environment variables act as overrides to their equivalent HADOOP_*
environment variables when 'yarn', 'mapred' and related commands are executed. Previously,
these were separated out which meant a significant amount of duplication of common settings.
 
* hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec and sbin.
 The sbin versions have been removed.
* The log4j settings forcibly set by some *-daemon.sh commands have been removed.  These settings
are now configurable in the *-env.sh files, in particular via *_OPT. 
* Support for various undocumentented YARN log4j.properties files has been removed.
* Support for $HADOOP_MASTER and the related rsync code have been removed.
* yarn.id.str has been removed.
* We now require bash v3 (released July 27, 2004) or better in order to take advantage of
better regex handling and ${BASH_SOURCE}.  POSIX sh will not work.
* Support for --script has been removed. We now use ${HADOOP_*_PATH} or ${HADOOP_PREFIX} to
find the necessary binaries.  (See other note regarding ${HADOOP_PREFIX} auto discovery.)
* Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be ignored and
stripped from their respective environment settings.

BUG FIXES:

* ${HADOOP_CONF_DIR} is now properly honored everywhere.
* Documented hadoop-layout.sh with a provided hadoop-layout.sh.example file.
* Shell commands should now work properly when called as a relative path and without HADOOP_PREFIX
being defined. If ${HADOOP_PREFIX} is not set, it will be automatically determined based upon
the current location of the shell library.  Note that other parts of the ecosystem may require
this environment variable to be configured.
* Operations which trigger ssh will now limit the number of connections to run in parallel
to ${HADOOP_SSH_PARALLEL} to prevent memory and network exhaustion.  By default, this is set
to 10.
* ${HADOOP_CLIENT_OPTS} support has been added to a few more commands.
* Various options on hadoop command lines were supported inconsistently.  These have been
unified into hadoop-config.sh. --config still needs to come first, however.
* ulimit logging for secure daemons no longer assumes /bin/bash but does assume bash on the
command line path.
* Removed references to some Yahoo! specific paths.
* Removed unused slaves.sh from YARN build tree.

IMPROVEMENTS:

* Significant amounts of redundant code have been moved into a new file called hadoop-functions.sh.
* Improved information in the default *-env.sh on what can be set, ramifications of setting,
etc.
* There is an attempt to do some trivial deduplication and sanitization of the classpath and
JVM options.  This allows, amongst other things, for custom settings in *_OPTS for Hadoop
daemons to override defaults and other generic settings (i.e., $HADOOP_OPTS).  This is particularly
relevant for Xmx settings, as one can now set them in _OPTS and ignore the heap specific options
for daemons which force the size in megabytes.
* Operations which trigger ssh connections can now use pdsh if installed.  $HADOOP_SSH_OPTS
still gets applied. 
* Subcommands have been alphabetized in both usage and in the code.
* All/most of the functionality provided by the sbin/* commands has been moved to either their
bin/ equivalents or made into functions.  The rewritten versions of these commands are now
wrappers to maintain backward compatibility. Of particular note is the new --daemon option
present in some bin/ commands which allow certain subcommands to be daemonized.
* It is now possible to override some of the shell code capabilities to provide site specific
functionality. 
* A new option called --buildpaths will attempt to add developer build directories to the
classpath to allow for in source tree testing.
* If a usage function is defined, the following will trigger a help message if it is given
in the option path to the shell script: --? -? ? --help -help -h help 
* Several generic environment variables have been added to provide a common configuration
for pids, logs, and their security equivalents.  The older versions still act as overrides
to these generic versions.
* Groundwork has been laid to allow for custom secure daemon setup using something other than
jsvc.
* Added distch and jnipath subcommands to hadoop command.


> Shell script rewrite
> --------------------
>
>                 Key: HADOOP-9902
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9902
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 3.0.0
>            Reporter: Allen Wittenauer
>            Assignee: Allen Wittenauer
>              Labels: releasenotes
>         Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902-4.patch, HADOOP-9902-5.patch,
HADOOP-9902-6.patch, HADOOP-9902-7.patch, HADOOP-9902-8.patch, HADOOP-9902.patch, HADOOP-9902.txt,
hadoop-9902-1.patch, more-info.txt
>
>
> Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message