hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-9902) Shell script rewrite
Date Thu, 14 Aug 2014 05:39:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096596#comment-14096596
] 

Allen Wittenauer edited comment on HADOOP-9902 at 8/14/14 5:37 AM:
-------------------------------------------------------------------

Thanks  [~rvs]!

bq. It could have been done in multiple increments in a separate jiras to help review better.

Not really.  As has been pointed out before, once you touch hadoop-config.sh or *-env.sh in
any sort of major way, you are pretty much touching everything since all the pieces are so
tightly interlocked.  As a result, you'd be reviewing the entire code base almost every time.


Additionally, the whole point of my posting of test code, changes, random discussion points,
etc as I went along was so that the 30+ people who have been watching this JIRA for almost
a year now could point out dumb things I did and make suggestions.  Some took advantage of
it and helped me get rid of some stupidity on my part, either here in the JIRA or elsewhere.
 I owe much gratitude to them. :D

It's probably worth pointing out that a good chunk of the new features have been floating
around in JIRAs in patch available status since pre-v1 days.  We clearly never cared enough
to review these features when they were already separate and the patches viable. This was
an opportunity to bring these good ideas and operational fixes forward. 

bq. Is there any concerns you see with the existing environment in mandating bash v3?

Nope.  Bash v3 shipped with, for example, Fedora 3 and Red Hat 8.x.  These are operating systems
that existed before Hadoop did.  FreeBSD and NetBSD didn't, and may still not, ship bash at
all as part of their core distribution. (It is, of course, in pkgsrc.) So we've been broken
on them forever anyway.  (Hai FreeBSD people who beat me up at conferences year after year...
) That release note is specifically for them since they always have to install bash anyway.
  I suppose we could always move to something like zsh which has a BSD-compatible license
and then ship it with Hadoop too.  (POSIX sh, BTW, is a significant performance drop.)

(Some of the bash folks were completely surprised I made the requirement *so low* given that
all modern OSes ship with v4.)

bq.  My concern is about the possible incompatibilities and breaking existing set of tools.

Again, as discussed before, this is exactly why this is going into trunk and not branch-2.
I'm treating this as an incompatible change even though I suspect that the vast majority of
stuff will "just work".  This comes from having used a form of this code for over a year now,
both secure and insecure, multiple operating systems, multiple configs, multiple types of
different ways to config, Hadoop v2.0.x through trunk, single hosts and multiple hosts, talking
about config with folks at conferences, running through shellcheck, etc, etc. 

To me, the biggest, most potentially breaking change is really going to be the dropping of
append in -env.sh.   We've only gotten away with it because we've depended upon undocumented
JVM behavior. But we can't dedupe JVM flags and support append in any sort of reliable manner.
 Given the  number of complaints, questions, and even JIRAs around "why so many Xmx's?", it's
clear that append had to go. 

But to restate, yet again, this is going into trunk.  Stuff may break. Hopefully not, but
if we can't put incompatible changes there, we've got bigger problems.


was (Author: aw):
Thanks  [~rvs]!

bq. It could have been done in multiple increments in a separate jiras to help review better.

Not really.  As has been pointed out before, once you touch hadoop-config.sh or *-env.sh in
any sort of major way, you are pretty much touching everything since all the pieces are so
tightly interlocked.  As a result, you'd be reviewing the entire code base almost every time.


Additionally, the whole point of my posting of test code, changes, random discussion points,
etc as I went along was so that the 30+ people who have been watching this JIRA for almost
a year now could point out dumb things I did and make suggestions.  Some took advantage of
it and helped me get rid of some stupidity on my part, either here in the JIRA or elsewhere.
 I owe much gratitude to them. :D

It's probably worth pointing out that a good chunk of the new features have been floating
around in JIRAs in patch available status since pre-v1 days.  We clearly never cared enough
to review these features when they were already separate and the patches viable. This was
an opportunity to bring these good ideas and operational fixes forward. 

bq. Is there any concerns you see with the existing environment in mandating bash v3?

Nope.  Bash v3 shipped with, for example, Fedora 3 and Red Hat 8.x.  These are operating systems
that existed before Hadoop did.  FreeBSD and NetBSD didn't, and may still not, ship bash at
all as part of their core distribution. (It is, of course, in pkgsrc.) So we've been broken
on them forever anyway.  (Hai FreeBSD people who beat me up at conferences year after year...
) That release note is specifically for them since they always have to install bash anyway.
  I suppose we could always move to something like zsh which has a BSD-compatible license
and then ship it with Hadoop too.  (POSIX sh, BTW, is a significant performance drop.)

(Some of the bash folks were completely surprised I made the requirement *so low* given that
all modern OSes ship with v4.)

bq.  My concern is about the possible incompatibilities and breaking existing set of tools.

Again, as discussed before, this is exactly why this is going into trunk and not branch-2.
I'm treating this as an incompatible change even though I suspect that the vast majority of
stuff will "just work".  This comes from having used a form of this code for over a year now,
both secure and insecure, multiple operating systems, multiple configs, multiple types of
different ways to config, Hadoop v2.0.x through trunk, single hosts and multiple hosts, talking
about config with folks at conferences, running through shellcheck, etc, etc. 

To me, the biggest, most potentially breaking change is really going to be the dropping of
append in *-env.sh.   We've only gotten away with it because we've depended upon undocumented
JVM behavior. But we can't dedupe JVM flags *and* support append in any sort of reliable manner.
 Given the  number of complaints, questions, and even JIRAs around "why so many Xmx's?", it's
clear that append had to go. 

But to restate, yet again, this is going into trunk.  Stuff may break. Hopefully not, but
if we can't put incompatible changes there, we've got bigger problems.

> Shell script rewrite
> --------------------
>
>                 Key: HADOOP-9902
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9902
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 3.0.0
>            Reporter: Allen Wittenauer
>            Assignee: Allen Wittenauer
>              Labels: releasenotes
>             Fix For: 3.0.0
>
>         Attachments: HADOOP-9902-10.patch, HADOOP-9902-11.patch, HADOOP-9902-12.patch,
HADOOP-9902-13-branch-2.patch, HADOOP-9902-13.patch, HADOOP-9902-14.patch, HADOOP-9902-2.patch,
HADOOP-9902-3.patch, HADOOP-9902-4.patch, HADOOP-9902-5.patch, HADOOP-9902-6.patch, HADOOP-9902-7.patch,
HADOOP-9902-8.patch, HADOOP-9902-9.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch,
more-info.txt
>
>
> Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message