hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
Date Wed, 21 Nov 2012 20:00:06 GMT
I like Alejandro's idea about Maven for a few of reasons:
  - bringing in a scripting environment which is known for its inter-version
    idiosyncrasies just because Windows can't handle trivial shell scripting
    looks like an overkill to me
  - relative to above, there's a chance that Python's pre-requisites used in
    Hadoop might get into a conflict with some other components in the stack.
    This will be a nightmare for the integrator projects i.e. Bigtop
  - Maven is de-facto standard for Java stacks
  - Maven has built-in scripting language (Groovy) if some plugins aren't
    sufficient for achieving whatever goals

Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses Maven
stuff suchs as deploy/install via custom ant tasks. Same approach would work
for saveVersion.sh and others, I am sure.

Cos

On Wed, Nov 21, 2012 at 11:25AM, Alejandro Abdelnur wrote:
> Hey Matt,
> 
> We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
> its way out with the move of docs to APT)
> 
> Why not do a maven-plugin to do that?
> 
> Colin already has something to simplify all the cmake calls from the builds
> using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
> 
> We could do the same with protoc, thus simplifying the POMs.
> 
> The saveVersion.sh seems like another prime candidate for a maven plugin,
> and in this case it would not require external tools.
> 
> Does this make sense?
> 
> Thx
> 
> On Wed, Nov 21, 2012 at 11:15 AM, Matt Foley <mattf@apache.org> wrote:
> 
> > This discussion started in
> > HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
> > , where it was proposed to replace the build-time utility "saveVersion.sh"
> > with a python script.  This would require Python as a build-time
> > dependency.  Here's the background:
> >
> > Those of us involved in the branch-1-win port of Hadoop to Windows without
> > use of Cygwin, have faced the issue of frequent use of shell scripts
> > throughout the system, both in build time (eg, the utility
> > "saveVersion.sh"),
> > and run time (config files like "hadoop-env.sh" and the start/stop scripts
> > in "bin/*" ).  Similar usages exist throughout the Hadoop stack, in all
> > projects.
> >
> > The vast majority of these shell scripts do not do anything platform
> > specific; they can be expressed in a posix-conforming way.  Therefore, it
> > seems to us that it makes sense to start using a cross-platform scripting
> > language, such as python, in place of shell for these purposes.  For those
> > rare occasions where platform-specific functionality really is needed,
> > python also supports quite a lot of platform-specific functionality on both
> > Linux and Windows; but where that is inadequate, one could still
> > conditionally invoke a platform-specific module written in shell (for
> > Linux/*nix) or powershell or bat (for Windows).
> >
> > The primary motive for moving to a cross-platform scripting language is
> > maintainability.  The alternative would be to maintain two complete suites
> > of scripts, one for Linux and one for Windows (and perhaps others in the
> > future).  We want to avoid the need to update dual modules in two different
> > languages when functionality changes, especially given that many Linux
> > developers are not familiar with powershell or bat, and many Windows
> > developers are not familiar with shell or bash.
> >
> > Regarding the choice of python:
> >
> >    - There are already a few instances of python usage in Hadoop, such as
> >    the utility (currently broken) "relnotes.py", and massive usage of
> > python
> >    in the examples/ and contrib/ directories.
> >    - Python is also used in Bigtop build-time.
> >    - The Python language is available for free on essentially all
> >    platforms, under an Apache-compatible
> > license<http://www.apache.org/legal/resolved.html>.
> >
> >    - It is supported in Eclipse and similar IDEs.
> >    - Most importantly, it is widely accepted as a reasonably good OO
> >    scripting language, and it is easily learned by anyone who already knows
> >    shell or perl, or other common scripting languages.
> >    - On the Tiobe index of programming language
> > popularity<
> > http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
> >    which seeks to measure the relative number of software engineers who
> > know
> >    and use each language, Python far exceeds Perl and Ruby.  The only more
> >    well-known scripting languages are PHP and Visual Basic, neither of
> > which
> >    seems a prime candidate for this use.
> >
> > For build-time usage, I think we should immediately approve python as a
> > build-time dependency, and allow people who are motivated to do so, to open
> > jiras for migrating existing build-time shell scripts to python.
> >
> > For run-time, there is likely to be a lot more discussion.  Lots of folks,
> > including me, aren't real happy with use of active scripts for
> > configuration, and various others, including I believe some of the Bigtop
> > folks, have issues with the way the start/stop scripts work.  Nevertheless,
> > all those scripts exist today and are widely used.  And they present an
> > impediment to porting to Windows-without-cygwin.
> >
> > Nothing about run-time use of scripts has changed significantly over the
> > past three years, and I don't think we should hold up the Windows port
> > while we have a huge discussion about issues that veer dangerously into
> > religious/aesthetic domains. It would be fun to have that discussion, but I
> > don't want this decision to be dependent on it!
> >
> > So I propose that we go ahead and also approve python as a run-time
> > dependency, and allow the inclusion of python scripts in place of current
> > shell-based functionality.  The unpleasant alternative is to spawn a bunch
> > of powershell scripts in parallel to the current shell scripts, with a very
> > negative impact on maintainability.  The Windows port must, after all, be
> > allowed to proceed.
> >
> > Let's have a discussion, and then I'll put both issues, separately, to a
> > vote (unless we miraculously achieve consensus without a vote :-)
> >
> > I also encourage members of the other Hadoop-related projects, to carry
> > this discussion into those forums.  It would be very cool to agree on a
> > whole-stack solution for the scripting problem.
> >
> > Best regards,
> > --Matt
> >
> 
> 
> 
> -- 
> Alejandro

Mime
View raw message