hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bikas Saha <bi...@hortonworks.com>
Subject Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
Date Fri, 30 Nov 2012 04:27:22 GMT
+1, +1, +1 (non-binding)

We have had promising results for 1 and 2 when porting to Windows. 3 would
allow us to remove platform dependencies from test code. Agree that there
might be some nuanced operations that require OS specific environments but
this would lead to keeping them at a minimum.

Bikas

On 11/29/12 7:22 PM, "Chuan Liu" <chuanliu@microsoft.com> wrote:

>+1 +1 +1
>
>Agree with Matt on the code maintainability.
>
>I think on one side we have Shell which is a script language and OS
>dependent, e.g. as in bash vs powershell;
>on the other side we have Java which is not a script language and OS
>independent.
>I would accept any script language that can fix the gap as an OS
>independent scripting language.
>Personally, I also prefer Python over Ruby.
>
>Thanks,
>Chuan
>
>________________________________________
>From: mfoley@hortonworks.com on behalf of Matt Foley
>Sent: Thursday, November 29, 2012 6:26 PM
>To: common-dev@hadoop.apache.org
>Subject: Re: [VOTE] introduce Python as build-time and run-time
>dependency for Hadoop and throughout Hadoop stack
>
>Hello again.  Crossed in the mail.
>
>* What kind of tasks you envision Python scripts will enable that are
>> not possible today?
>
>
>The point isn't to open brave new worlds.  The point is to avoid the
>nightmare of having to maintain multiple "parallel" scripts doing the SAME
>THING in multiple scripting languages.  I know from experience that they
>never get maintained right.  It's just a huge source of bugs, because when
>they are in different languages, it can be quite difficult to determine
>that they are *really* doing the same thing.  And in a case like shell vs
>powershell, it will be very common to have contributors who are not
>experts
>in both.
>
>I care deeply about having a high-quality release in both Linux and
>Windows.  And having a cross-platform scripting language will make it much
>easier to maintain that quality over time, without "slip" between the two
>platforms.
>
>* Will the requirement of Python be pushed to clients using the
>> hadoop script? If so, this would affect all downstream projects that use
>> hadoop script in one why or the other, right?
>
>
>If question #3 passes, then Python will become a run-time dependency for
>Hadoop.  That means it would need to be installed as part of the Hadoop
>install preparation, just like all the other Hadoop run-time dependencies.
>
>Is the main motivation of the proposal to make things easier for window,
>> so there is no need for cygwin? If that is the case, have you considered
>> doing directly BAT scripts? If you take Tomcat for example, they have
>>BAT
>> scripts and SH scripts and things work quite nicely.
>
>
>Of course it is sufficient, from the simple implementation perspective, to
>translate all the shell scripts into bat or (better) powershell scripts.
> That is, in fact, the most evident alternative to my proposals #1 and #3.
>
>However, I ask -- beg! -- the community to consider it from the software
>engineering perspective.  We aren't here to just implement something once
>and be done.  It has to be maintained, as most of you on this list are
>well
>aware, for years and years, across multiple generations.  And trying to
>maintain parallel scripts in multiple languages, when not necessitated by
>genuine platform-specific requirements, is just creating bug generators in
>the system.
>
>Personally, I wouldn't be trilled to see the logic in the scripts to
>> get more complex, but on the opposite direction; IMO, scripts should be
>> trimmed to set env vars (with no voodoo logic), build the classpath
>>(with
>> no voodoo logic, just from a set of dirs) and call Java.
>
>
>See the first item above.  The point is to enable cross-platform scripting
>of the things we already have to script.  IMO, scripts should get out of
>the env var business entirely, but that's unrelated to this question :-)
>
>Finally, this is code change, so I'm not sure why we are doing a vote.
>
>
>I view this as a tools issue, that affects questions that go beyond the
>one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
>(atm) recommended that I bring it to the list.  So here we are :-)
>
>Cheers,
>--Matt
>
>On Thu, Nov 29, 2012 at 5:25 PM, Alejandro Abdelnur
><tucu@cloudera.com>wrote:
>
>> Matt,
>>
>> Let me repost my previous questions and a few more. I'd appreciate your
>> answers, as it will help me understand the full impact this would have
>>in
>> Hadoop and related projects.
>>
>> * Phyton as runtime requirement. Are you planing to migrate all BASH
>> scripts provided by Hadoop (or dynamically created -ie launcher scripts)
>>  to Phyton?
>> * What else in the current build, besides saveVersion.sh, you see as
>> candidate to be migrated to Phyton?
>> * How are you planning to define what Phyton modules can be used? Will
>> developers have to install them manually?
>> * What kind of tasks you envision Python scripts will enable that are
>>not
>> possible today?
>> * Will the requirement of Python be pushed to clients using the hadoop
>> script? If so, this would affect all downstream projects that use hadoop
>> script in one why or the other, right?
>>
>> Is the main motivation of the proposal to make things easier for
>>window, so
>> there is no need for cygwin? If that is the case, have you considered
>>doing
>> directly BAT scripts? If you take Tomcat for example, they have BAT
>>scripts
>> and SH scripts and things work quite nicely.
>>
>> Personally, I wouldn't be trilled to see the logic in the scripts to get
>> more complex, but on the opposite direction; IMO, scripts should be
>>trimmed
>> to set env vars (with no voodoo logic), build the classpath (with no
>>voodoo
>> logic, just from a set of dirs) and call Java.
>>
>> Finally, this is code change, so I'm not sure why we are doing a vote.
>>
>> Thx.
>>
>> On Thu, Nov 29, 2012 at 3:26 PM, Alejandro Abdelnur <tucu@cloudera.com
>> >wrote:
>>
>> > Matt, thanks for the clarification.
>> >
>> > I may have missed the main point of the PROPOSAL thread then. I
>> personally
>> > want to continue the discussion before voting.
>> >
>> > * Phyton as runtime requirement. Are you planing to migrate all BASH
>> > scripts provided by Hadoop (or dynamically created -ie launcher
>>scripts)
>> >  to Phyton?
>> > * What else in the current build, besides saveVersion.sh, you see as
>> > candidate to be migrated to Phyton?
>> > * How are you planning to define what Phyton modules can be used? Will
>> > developers have to install them manually?
>> >
>> > Cheers
>> >
>> >
>> > On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <mfoley@hortonworks.com
>> >wrote:
>> >
>> >> Hi Alejandro,
>> >> Please see in-line below.
>> >>
>> >> On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur
>><tucu@cloudera.com>
>> >>  wrote:
>> >>
>> >> > Matt,
>> >> >
>> >> > The scope of this vote seems different from what was discussed in
>>the
>> >> > PROPOSAL thread.
>> >> > In the PROPOSAL thread you indicated this was for Hadoop1 because
>>it
>> is
>> >> ANT
>> >> > based. And the main reason was to remove saveVersion.sh.
>> >> > Your #3  was not discussed in the proposal, was it?
>> >> >
>> >>
>> >> The item #3 was in my original statement of the problem, with which I
>> >> started the proposal thread.  In fact, the thread title was
>>"[PROPOSAL]
>> >> introduce Python as build-time and run-time dependency for Hadoop and
>> >> throughout Hadoop stack".  It is true that only one or two people
>>chose
>> to
>> >> discuss #3 further in that thread.
>> >>
>> >> The point is not just to replace a single script, but to provide a
>>means
>> >> to
>> >> do cross-platform scripts, which will over time replace many
>> >> non-platform-specific scripts written in platform-specific languages.
>> >>
>> >>
>> >> >
>> >> > It seems this vote is dragging much more stuff it was originally
>> >> discussed.
>> >> > I think you should suspend the vote, recap the motivation and then
>> >> restart
>> >> > the vote.
>> >> >
>> >>
>> >> I respectfully disagree.  I believe a careful reading of the cited
>> >> discussion thread, plus my own statement of the vote, provides
>> sufficient
>> >> background for a thoughtful decision on the subject.  Presumably so
>>do
>> the
>> >> ten other people who had already voted before you made that comment.
>> >>
>> >> If several other people want more discussion first, please speak up.
>> >> Thanks,
>> >> --Matt
>> >>
>> >> As things are laid out at the moment my vote is:
>> >> >
>> >> > -1 (It still seems an overkill to introduce a new runtime
>>requirement
>> >> for
>> >> > building to replace a script.)
>> >> > +1 (I think this is the right way to simplify the build)
>> >> > -1 (AFAIK there is not such requirement at the moment, and if it
>>comes
>> >> it
>> >> > would be in the form of an AM, which I'd argue it should leave
>>outside
>> >> of
>> >> > Hadoop)
>> >> >
>> >> > Thx
>> >> >
>> >> >
>> >> > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
>> >> > gkesavan@hortonworks.com> wrote:
>> >> >
>> >> > > +1, +1, +1
>> >> > >
>> >> > > -Giri
>> >> > >
>> >> > >
>> >> > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <mattf@apache.org>
>> >> wrote:
>> >> > >
>> >> > > > For discussion, please see previous thread "[PROPOSAL]
>>introduce
>> >> Python
>> >> > > as
>> >> > > > build-time and run-time dependency for Hadoop and throughout
>> Hadoop
>> >> > > stack".
>> >> > > >
>> >> > > > This vote consists of three separate items:
>> >> > > >
>> >> > > > 1. Contributors shall be allowed to use Python as a
>> >> > platform-independent
>> >> > > > scripting language for build-time tasks, and add Python as
a
>> >> build-time
>> >> > > > dependency.
>> >> > > > Please vote +1, 0, -1.
>> >> > > >
>> >> > > > 2. Contributors shall be encouraged to use Maven tasks in
>> >> combination
>> >> > > with
>> >> > > > either plug-ins or Groovy scripts to do cross-platform
>>build-time
>> >> > tasks,
>> >> > > > even under ant in Hadoop-1.
>> >> > > > Please vote +1, 0, -1.
>> >> > > >
>> >> > > > 3. Contributors shall be allowed to use Python as a
>> >> > platform-independent
>> >> > > > scripting language for run-time tasks, and add Python as
a
>> run-time
>> >> > > > dependency.
>> >> > > > Please vote +1, 0, -1.
>> >> > > >
>> >> > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
>> >> > contributors
>> >> > > to
>> >> > > > use Maven plug-ins or Groovy as the only means of
>>cross-platform
>> >> > > build-time
>> >> > > > tasks, or to simply continue using platform-dependent scripts
>>as
>> is
>> >> > being
>> >> > > > done today.
>> >> > > >
>> >> > > > Vote closes at 12:30pm PST on Saturday 1 December.
>> >> > > > ---------
>> >> > > > Personally, my vote is +1, +1, +1.
>> >> > > > I think #2 is preferable to #1, but still has many unknowns
in
>>it,
>> >> and
>> >> > > > until those are worked out I don't want to delay moving to
>> >> > cross-platform
>> >> > > > scripts for build-time tasks.
>> >> > > >
>> >> > > > Best regards,
>> >> > > > --Matt
>> >> > > >
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Alejandro
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Alejandro
>> >
>>
>>
>>
>> --
>> Alejandro
>>
>



Mime
View raw message