hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <cdoug...@apache.org>
Subject Re: When are incompatible changes acceptable (HDFS-12990)
Date Fri, 12 Jan 2018 04:26:29 GMT
On Thu, Jan 11, 2018 at 6:34 PM Tsz Wo Sze <szetszwo@yahoo.com> wrote:

> The question is: how are we going to fix it?
>

What do you propose? -C

> No incompatible changes are allowed between 3.0.0 and 3.0.1. Dot releases
> only allow bug fixes.
>
> We may not like the statement above but it is our compatibility policy.
> We should either follow the policy or revise it.
>
> Some more questions:
>
>    - What if someone is already using 3.0.0 and has changed all the
>    scripts to 9820?  Just let them fail?
>    - Compared to 2.x, 3.0.0 has many incompatible changes. Are we going
>    to have other incompatible changes in the future minor and dot releases?
>    What is the criteria to decide which incompatible changes are allowed?
>    - I hate that we have prematurely released 3.0.0 and make 3.0.1
>    incompatible to 3.0.0. If the "bug" is that serious, why not fixing it in
>    4.0.0 and declare 3.x as dead?
>    - It seems obvious that no one has seriously tested it so that the
>    problem is not uncovered until now. Are there bugs in our current release
>    procedure?
>
>
> Thanks
> Tsz-Wo
>
>
>
> On Thursday, January 11, 2018, 11:36:33 AM GMT+8, Chris Douglas <
> cdouglas@apache.org> wrote:
>
>
> Isn't this limited to reverting the 8020 -> 9820 change? -C
>
> On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <eyang@hortonworks.com> wrote:
>
> > The fix in HDFS-9427 can potentially bring in new customers because less
> > chance for new comer to encountering “port already in use” problem.  If
> we
> > make change according to HDFS-12990, then this incompatible change does
> not
> > make incompatible change compatible.  Other ports are not reverted
> > according to HDFS-12990.  User will encounter the bad taste in the mouth
> > that HDFS-9427 attempt to solve.  Please do consider both negative side
> > effects of reverting as well as incompatible minor release change.
> Thanks
> >
> > Regards,
> > Eric
> >
> > From: larry mccay <lmccay@apache.org>
> > Date: Wednesday, January 10, 2018 at 10:53 AM
> > To: Daryn Sharp <daryn@oath.com>
> > Cc: "Aaron T. Myers" <atm@apache.org>, Eric Yang <eyang@hortonworks.com
> >,
> > Chris Douglas <cdouglas@apache.org>, Hadoop Common <
> > common-dev@hadoop.apache.org>
> > Subject: Re: When are incompatible changes acceptable (HDFS-12990)
> >
> > On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <daryn@oath.com<mailto:
> > daryn@oath.com>> wrote:
> >
> > I fully agree the port changes should be reverted.  Although
> > "incompatible", the potential impact to existing 2.x deploys is huge.
> I'd
> > rather inconvenience 3.0 deploys that compromise <1% customers.  An
> > incompatible change to revert an incompatible change is called
> > compatibility.
> >
> > +1
> >
> >
> >
> >
> > Most importantly, consider that there is no good upgrade path existing
> > deploys, esp. large and/or multi-cluster environments.  It’s only
> feasible
> > for first-time deploys or simple single-cluster upgrades willing to take
> > downtime.  Let's consider a few reasons why:
> >
> >
> >
> > 1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
> > bundles the configs, there's no way to transparently coordinate the
> switch
> > to the new bundle with the port changed.  Job submissions will fail.
> >
> >
> >
> > 2. Users generally do not add the rpc port number to uris so unless their
> > configs are updated they will contact the wrong port.  Seamlessly
> > coordinating the conf change without massive failures is impossible.
> >
> >
> >
> > 3. Even if client confs are updated, they will break in a multi-cluster
> > env with NNs using different ports.  Users/services will be forced to add
> > the port.  The cited hive "issue" is not a bug since it's the only way to
> > work in a multi-port env.
> >
> >
> >
> > 4. Coordinating the port add/change of uris is systems everywhere (you
> > know something will be missed), updating of confs, restarting all
> services,
> > requiring customers to redeploy their workflows in sync with the NN
> > upgrade, will cause mass disruption and downtime that will be
> unacceptable
> > for production environments.
> >
> >
> >
> > This is a solution to a non-existent problem.  Ports can be bound by
> > multiple processes but only 1 can listen.  Maybe multiple listeners is an
> > issue for compute nodes but not responsibly managed service nodes.  Ie.
> Who
> > runs arbitrary services on the NNs that bind to random ports?  Besides,
> the
> > default port is and was ephemeral so it solved nothing.
> >
> >
> >
> > This either standardizes ports to a particular customer's ports or is a
> > poorly thought out whim.  In either case, the needs of the many outweigh
> > the needs of the few/none (3.0 users).  The only logical conclusion is
> > revert.  If a particular site wants to change default ports and deal with
> > the massive fallout, they can explicitly change the ports themselves.
> >
> >
> >
> > Daryn
> >
> > On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers <atm@apache.org<mailto:
> > atm@apache.org>> wrote:
> > On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang <eyang@hortonworks.com<mailto:
> > eyang@hortonworks.com>> wrote:
> >
> > > While I agree the original port change was unnecessary, I don’t think
> > > Hadoop NN port change is a bad thing.
> > >
> > > I worked for a Hadoop distro that NN RPC port was default to port 9000.
> > > When we migrate from BigInsights to IOP and now to HDP, we have to move
> > > customer Hive metadata to new NN RPC port.  It only took one developer
> > > (myself) to write the tool for the migration.  The incurring workload
> is
> > > not as bad as most people anticipated because Hadoop depends on
> > > configuration file for referencing namenode.  Most of the code can work
> > > transparently.  It helped to harden the downstream testing tools to be
> > more
> > > robust.
> > >
> >
> > While there are of course ways to deal with this, the question really
> > should be whether or not it's a desirable thing to do to our users.
> >
> >
> > >
> > > We will never know how many people are actively working on Hadoop
> 3.0.0.
> > > Perhaps, couple hundred developers or thousands.
> >
> >
> > You're right that we can't know for sure, but I strongly suspect that
> this
> > is a substantial overestimate. Given how conservative Hadoop operators
> tend
> > to be, I view it as exceptionally unlikely that many deployments have
> been
> > created on or upgraded to Hadoop 3.0.0 since it was released less than a
> > month ago.
> >
> > Further, I hope you'll agree that the number of
> > users/developers/deployments/applications which are currently on Hadoop
> 2.x
> > is *vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so
> > quickly. When all of those users upgrade to any 3.x version, they will
> > encounter this needless incompatible change and be forced to work around
> > it.
> >
> >
> > > I think the switch back may have saved few developers work, but there
> > > could be more people getting impacted at unexpected minor release
> change
> > in
> > > the future.  I recommend keeping current values to avoid rule bending
> and
> > > future frustrations.
> > >
> >
> > That we allow this incompatible change now does not mean that we are
> > categorically allowing more incompatible changes in the future. My point
> is
> > that we should in all instances evaluate the merit of any incompatible
> > change on a case-by-case basis. This is not an exceptional circumstance -
> > we've made incompatible changes in the past when appropriate, e.g.
> breaking
> > some clients to address a security issue. I and others believe that in
> this
> > case the benefits greatly outweigh the downsides of changing this back to
> > what it has always been.
> >
> > Best,
> > Aaron
> >
> >
> > >
> > > Regards,
> > > Eric
> > >
> > > On 1/9/18, 11:21 AM, "Chris Douglas" <cdouglas@apache.org<mailto:
> > cdouglas@apache.org>> wrote:
> > >
> > >    Particularly since 9820 isn't in the contiguous range of ports in
> > >    HDFS-9427, is there any value in this change?
> > >
> > >    Let's change it back to prevent the disruption to users, but
> > >    downstream projects should treat this as a bug in their tests.
> Please
> > >    open JIRAs in affected projects. -C
> > >
> > >
> > >    On Tue, Jan 9, 2018 at 5:18 AM, larry mccay <lmccay@apache.org
> > <mailto:lmccay@apache.org>> wrote:
> > >    > On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers <atm@apache.org
> > <mailto:atm@apache.org>>
> > > wrote:
> > >    >
> > >    >> Thanks a lot for the response, Larry. Comments inline.
> > >    >>
> > >    >> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay <lmccay@apache.org
> > <mailto:lmccay@apache.org>>
> > > wrote:
> > >    >>
> > >    >>> Question...
> > >    >>>
> > >    >>> Can this be addressed in some way during or before upgrade
that
> > > allows it
> > >    >>> to only affect new installs?
> > >    >>> Even a config based workaround prior to upgrade might make
this
> a
> > > change
> > >    >>> less disruptive.
> > >    >>>
> > >    >>> If part of the upgrade process includes a step (maybe even
a
> > > script) to
> > >    >>> set the NN RPC port explicitly beforehand then it would allow
> > > existing
> > >    >>> deployments and related clients to remain whole - otherwise
it
> > > will uptake
> > >    >>> the new default port.
> > >    >>>
> > >    >>
> > >    >> Perhaps something like this could be done, but I think there are
> > > downsides
> > >    >> to anything like this. For example, I'm sure there are plenty of
> > >    >> applications written on top of Hadoop that have tests which
> > > hard-code the
> > >    >> port number. Nothing we do in a setup script will help here. If
> we
> > > don't
> > >    >> change the default port back to what it was, these tests will
> > > likely all
> > >    >> have to be updated.
> > >    >>
> > >    >>
> > >    >
> > >    > I may not have made my point clear enough.
> > >    > What I meant to say is to fix the default port but direct folks to
> > >    > explicitly set the port they are using in a deployment (the
> current
> > >    > default) so that it doesn't change out from under them - unless
> > they
> > > are
> > >    > fine with it changing.
> > >    >
> > >    >
> > >    >>
> > >    >>> Meta note: we shouldn't be so pedantic about policy that we
> can't
> > > back
> > >    >>> out something that is considered a bug or even mistake.
> > >    >>>
> > >    >>
> > >    >> This is my bigger point. Rigidly adhering to the compat
> guidelines
> > > in this
> > >    >> instance helps almost no one, while hurting many folks.
> > >    >>
> > >    >> We basically made a mistake when we decided to change the default
> > > NN port
> > >    >> with little upside, even between major versions. We discovered
> > this
> > > very
> > >    >> quickly, and we have an opportunity to fix it now and in so doing
> > > likely
> > >    >> disrupt very, very few users and downstream applications. If we
> > > don't
> > >    >> change it, we'll be causing difficulty for our users, downstream
> > >    >> developers, and ourselves, potentially for years.
> > >    >>
> > >    >
> > >    > Agreed.
> > >    >
> > >    >
> > >    >>
> > >    >> Best,
> > >    >> Aaron
> > >    >>
> > >
> > >
> ---------------------------------------------------------------------
> > >    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > <mailto:common-dev-unsubscribe@hadoop.apache.org>
> > >    For additional commands, e-mail: common-dev-help@hadoop.apache.org
> > <mailto:common-dev-help@hadoop.apache.org>
> > >
> > >
> > >
> > >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message