hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <cdoug...@apache.org>
Subject Re: When are incompatible changes acceptable (HDFS-12990)
Date Thu, 11 Jan 2018 03:36:11 GMT
Isn't this limited to reverting the 8020 -> 9820 change? -C

On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <eyang@hortonworks.com> wrote:

> The fix in HDFS-9427 can potentially bring in new customers because less
> chance for new comer to encountering “port already in use” problem.  If we
> make change according to HDFS-12990, then this incompatible change does not
> make incompatible change compatible.  Other ports are not reverted
> according to HDFS-12990.  User will encounter the bad taste in the mouth
> that HDFS-9427 attempt to solve.  Please do consider both negative side
> effects of reverting as well as incompatible minor release change.  Thanks
>
> Regards,
> Eric
>
> From: larry mccay <lmccay@apache.org>
> Date: Wednesday, January 10, 2018 at 10:53 AM
> To: Daryn Sharp <daryn@oath.com>
> Cc: "Aaron T. Myers" <atm@apache.org>, Eric Yang <eyang@hortonworks.com>,
> Chris Douglas <cdouglas@apache.org>, Hadoop Common <
> common-dev@hadoop.apache.org>
> Subject: Re: When are incompatible changes acceptable (HDFS-12990)
>
> On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <daryn@oath.com<mailto:
> daryn@oath.com>> wrote:
>
> I fully agree the port changes should be reverted.  Although
> "incompatible", the potential impact to existing 2.x deploys is huge.  I'd
> rather inconvenience 3.0 deploys that compromise <1% customers.  An
> incompatible change to revert an incompatible change is called
> compatibility.
>
> +1
>
>
>
>
> Most importantly, consider that there is no good upgrade path existing
> deploys, esp. large and/or multi-cluster environments.  It’s only feasible
> for first-time deploys or simple single-cluster upgrades willing to take
> downtime.  Let's consider a few reasons why:
>
>
>
> 1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
> bundles the configs, there's no way to transparently coordinate the switch
> to the new bundle with the port changed.  Job submissions will fail.
>
>
>
> 2. Users generally do not add the rpc port number to uris so unless their
> configs are updated they will contact the wrong port.  Seamlessly
> coordinating the conf change without massive failures is impossible.
>
>
>
> 3. Even if client confs are updated, they will break in a multi-cluster
> env with NNs using different ports.  Users/services will be forced to add
> the port.  The cited hive "issue" is not a bug since it's the only way to
> work in a multi-port env.
>
>
>
> 4. Coordinating the port add/change of uris is systems everywhere (you
> know something will be missed), updating of confs, restarting all services,
> requiring customers to redeploy their workflows in sync with the NN
> upgrade, will cause mass disruption and downtime that will be unacceptable
> for production environments.
>
>
>
> This is a solution to a non-existent problem.  Ports can be bound by
> multiple processes but only 1 can listen.  Maybe multiple listeners is an
> issue for compute nodes but not responsibly managed service nodes.  Ie. Who
> runs arbitrary services on the NNs that bind to random ports?  Besides, the
> default port is and was ephemeral so it solved nothing.
>
>
>
> This either standardizes ports to a particular customer's ports or is a
> poorly thought out whim.  In either case, the needs of the many outweigh
> the needs of the few/none (3.0 users).  The only logical conclusion is
> revert.  If a particular site wants to change default ports and deal with
> the massive fallout, they can explicitly change the ports themselves.
>
>
>
> Daryn
>
> On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers <atm@apache.org<mailto:
> atm@apache.org>> wrote:
> On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang <eyang@hortonworks.com<mailto:
> eyang@hortonworks.com>> wrote:
>
> > While I agree the original port change was unnecessary, I don’t think
> > Hadoop NN port change is a bad thing.
> >
> > I worked for a Hadoop distro that NN RPC port was default to port 9000.
> > When we migrate from BigInsights to IOP and now to HDP, we have to move
> > customer Hive metadata to new NN RPC port.  It only took one developer
> > (myself) to write the tool for the migration.  The incurring workload is
> > not as bad as most people anticipated because Hadoop depends on
> > configuration file for referencing namenode.  Most of the code can work
> > transparently.  It helped to harden the downstream testing tools to be
> more
> > robust.
> >
>
> While there are of course ways to deal with this, the question really
> should be whether or not it's a desirable thing to do to our users.
>
>
> >
> > We will never know how many people are actively working on Hadoop 3.0.0.
> > Perhaps, couple hundred developers or thousands.
>
>
> You're right that we can't know for sure, but I strongly suspect that this
> is a substantial overestimate. Given how conservative Hadoop operators tend
> to be, I view it as exceptionally unlikely that many deployments have been
> created on or upgraded to Hadoop 3.0.0 since it was released less than a
> month ago.
>
> Further, I hope you'll agree that the number of
> users/developers/deployments/applications which are currently on Hadoop 2.x
> is *vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so
> quickly. When all of those users upgrade to any 3.x version, they will
> encounter this needless incompatible change and be forced to work around
> it.
>
>
> > I think the switch back may have saved few developers work, but there
> > could be more people getting impacted at unexpected minor release change
> in
> > the future.  I recommend keeping current values to avoid rule bending and
> > future frustrations.
> >
>
> That we allow this incompatible change now does not mean that we are
> categorically allowing more incompatible changes in the future. My point is
> that we should in all instances evaluate the merit of any incompatible
> change on a case-by-case basis. This is not an exceptional circumstance -
> we've made incompatible changes in the past when appropriate, e.g. breaking
> some clients to address a security issue. I and others believe that in this
> case the benefits greatly outweigh the downsides of changing this back to
> what it has always been.
>
> Best,
> Aaron
>
>
> >
> > Regards,
> > Eric
> >
> > On 1/9/18, 11:21 AM, "Chris Douglas" <cdouglas@apache.org<mailto:
> cdouglas@apache.org>> wrote:
> >
> >     Particularly since 9820 isn't in the contiguous range of ports in
> >     HDFS-9427, is there any value in this change?
> >
> >     Let's change it back to prevent the disruption to users, but
> >     downstream projects should treat this as a bug in their tests. Please
> >     open JIRAs in affected projects. -C
> >
> >
> >     On Tue, Jan 9, 2018 at 5:18 AM, larry mccay <lmccay@apache.org
> <mailto:lmccay@apache.org>> wrote:
> >     > On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers <atm@apache.org
> <mailto:atm@apache.org>>
> > wrote:
> >     >
> >     >> Thanks a lot for the response, Larry. Comments inline.
> >     >>
> >     >> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay <lmccay@apache.org
> <mailto:lmccay@apache.org>>
> > wrote:
> >     >>
> >     >>> Question...
> >     >>>
> >     >>> Can this be addressed in some way during or before upgrade that
> > allows it
> >     >>> to only affect new installs?
> >     >>> Even a config based workaround prior to upgrade might make this
a
> > change
> >     >>> less disruptive.
> >     >>>
> >     >>> If part of the upgrade process includes a step (maybe even a
> > script) to
> >     >>> set the NN RPC port explicitly beforehand then it would allow
> > existing
> >     >>> deployments and related clients to remain whole - otherwise it
> > will uptake
> >     >>> the new default port.
> >     >>>
> >     >>
> >     >> Perhaps something like this could be done, but I think there are
> > downsides
> >     >> to anything like this. For example, I'm sure there are plenty of
> >     >> applications written on top of Hadoop that have tests which
> > hard-code the
> >     >> port number. Nothing we do in a setup script will help here. If we
> > don't
> >     >> change the default port back to what it was, these tests will
> > likely all
> >     >> have to be updated.
> >     >>
> >     >>
> >     >
> >     > I may not have made my point clear enough.
> >     > What I meant to say is to fix the default port but direct folks to
> >     > explicitly set the port they are using in a deployment (the current
> >     > default) so that it doesn't change out from under them - unless
> they
> > are
> >     > fine with it changing.
> >     >
> >     >
> >     >>
> >     >>> Meta note: we shouldn't be so pedantic about policy that we can't
> > back
> >     >>> out something that is considered a bug or even mistake.
> >     >>>
> >     >>
> >     >> This is my bigger point. Rigidly adhering to the compat guidelines
> > in this
> >     >> instance helps almost no one, while hurting many folks.
> >     >>
> >     >> We basically made a mistake when we decided to change the default
> > NN port
> >     >> with little upside, even between major versions. We discovered
> this
> > very
> >     >> quickly, and we have an opportunity to fix it now and in so doing
> > likely
> >     >> disrupt very, very few users and downstream applications. If we
> > don't
> >     >> change it, we'll be causing difficulty for our users, downstream
> >     >> developers, and ourselves, potentially for years.
> >     >>
> >     >
> >     > Agreed.
> >     >
> >     >
> >     >>
> >     >> Best,
> >     >> Aaron
> >     >>
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> <mailto:common-dev-unsubscribe@hadoop.apache.org>
> >     For additional commands, e-mail: common-dev-help@hadoop.apache.org
> <mailto:common-dev-help@hadoop.apache.org>
> >
> >
> >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message