hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Abdelnur <t...@cloudera.com>
Subject Re: [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
Date Wed, 21 Nov 2012 19:25:04 GMT
Hey Matt,

We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
its way out with the move of docs to APT)

Why not do a maven-plugin to do that?

Colin already has something to simplify all the cmake calls from the builds
using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)

We could do the same with protoc, thus simplifying the POMs.

The saveVersion.sh seems like another prime candidate for a maven plugin,
and in this case it would not require external tools.

Does this make sense?


On Wed, Nov 21, 2012 at 11:15 AM, Matt Foley <mattf@apache.org> wrote:

> This discussion started in
> HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
> , where it was proposed to replace the build-time utility "saveVersion.sh"
> with a python script.  This would require Python as a build-time
> dependency.  Here's the background:
> Those of us involved in the branch-1-win port of Hadoop to Windows without
> use of Cygwin, have faced the issue of frequent use of shell scripts
> throughout the system, both in build time (eg, the utility
> "saveVersion.sh"),
> and run time (config files like "hadoop-env.sh" and the start/stop scripts
> in "bin/*" ).  Similar usages exist throughout the Hadoop stack, in all
> projects.
> The vast majority of these shell scripts do not do anything platform
> specific; they can be expressed in a posix-conforming way.  Therefore, it
> seems to us that it makes sense to start using a cross-platform scripting
> language, such as python, in place of shell for these purposes.  For those
> rare occasions where platform-specific functionality really is needed,
> python also supports quite a lot of platform-specific functionality on both
> Linux and Windows; but where that is inadequate, one could still
> conditionally invoke a platform-specific module written in shell (for
> Linux/*nix) or powershell or bat (for Windows).
> The primary motive for moving to a cross-platform scripting language is
> maintainability.  The alternative would be to maintain two complete suites
> of scripts, one for Linux and one for Windows (and perhaps others in the
> future).  We want to avoid the need to update dual modules in two different
> languages when functionality changes, especially given that many Linux
> developers are not familiar with powershell or bat, and many Windows
> developers are not familiar with shell or bash.
> Regarding the choice of python:
>    - There are already a few instances of python usage in Hadoop, such as
>    the utility (currently broken) "relnotes.py", and massive usage of
> python
>    in the examples/ and contrib/ directories.
>    - Python is also used in Bigtop build-time.
>    - The Python language is available for free on essentially all
>    platforms, under an Apache-compatible
> license<http://www.apache.org/legal/resolved.html>.
>    - It is supported in Eclipse and similar IDEs.
>    - Most importantly, it is widely accepted as a reasonably good OO
>    scripting language, and it is easily learned by anyone who already knows
>    shell or perl, or other common scripting languages.
>    - On the Tiobe index of programming language
> popularity<
> http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
>    which seeks to measure the relative number of software engineers who
> know
>    and use each language, Python far exceeds Perl and Ruby.  The only more
>    well-known scripting languages are PHP and Visual Basic, neither of
> which
>    seems a prime candidate for this use.
> For build-time usage, I think we should immediately approve python as a
> build-time dependency, and allow people who are motivated to do so, to open
> jiras for migrating existing build-time shell scripts to python.
> For run-time, there is likely to be a lot more discussion.  Lots of folks,
> including me, aren't real happy with use of active scripts for
> configuration, and various others, including I believe some of the Bigtop
> folks, have issues with the way the start/stop scripts work.  Nevertheless,
> all those scripts exist today and are widely used.  And they present an
> impediment to porting to Windows-without-cygwin.
> Nothing about run-time use of scripts has changed significantly over the
> past three years, and I don't think we should hold up the Windows port
> while we have a huge discussion about issues that veer dangerously into
> religious/aesthetic domains. It would be fun to have that discussion, but I
> don't want this decision to be dependent on it!
> So I propose that we go ahead and also approve python as a run-time
> dependency, and allow the inclusion of python scripts in place of current
> shell-based functionality.  The unpleasant alternative is to spawn a bunch
> of powershell scripts in parallel to the current shell scripts, with a very
> negative impact on maintainability.  The Windows port must, after all, be
> allowed to proceed.
> Let's have a discussion, and then I'll put both issues, separately, to a
> vote (unless we miraculously achieve consensus without a vote :-)
> I also encourage members of the other Hadoop-related projects, to carry
> this discussion into those forums.  It would be very cool to agree on a
> whole-stack solution for the scripting problem.
> Best regards,
> --Matt


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message