hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Junping Du <...@hortonworks.com>
Subject Re: [DISCUSS] A final minor release off branch-2?
Date Tue, 21 Nov 2017 21:41:16 GMT
Hi Andrew,

bq. Source and binary compatibility are not required for 3.0.0. It's a new major release,
and there are known, documented incompatibilities in this regard.

Technically, it is true. However, in practically, we should retain compatibility as much as
we can. Otherwise, we could break downstream projects, third-party libraries and existing
users applications unintentionally. A quick example here is a blocker issue I just reported
in HADOOP-15059 which break old (2.x) MR application with 3.0 deployment - due to token format
incompatible issue.


bq. To follow up on my earlier email, I don't think there's need for a bridge release given
that we've successfully tested rolling upgrade from 2.x to 3.0.0.‚Äč

Did we find the same issue as HADOOP-15059? If so, just curious on what rolling upgrade means
here - IMO, upgrade with breaking running applications shouldn't be recognized as "rolling".
Do I miss anything?



Thanks,


Junping


________________________________
From: Andrew Wang <andrew.wang@cloudera.com>
Sent: Wednesday, November 15, 2017 10:34 AM
To: Junping Du
Cc: Wangda Tan; Steve Loughran; Vinod Kumar Vavilapalli; Kai Zheng; Arun Suresh; common-dev@hadoop.apache.org;
yarn-dev@hadoop.apache.org; Hdfs-dev; mapreduce-dev@hadoop.apache.org
Subject: Re: [DISCUSS] A final minor release off branch-2?

Hi Junping,

On Wed, Nov 15, 2017 at 1:37 AM, Junping Du <jdu@hortonworks.com<mailto:jdu@hortonworks.com>>
wrote:
Thanks Vinod to bring up this discussion, which is just in time.

I agree with most responses that option C is not a good choice as our community bandwidth
is precious and we should focus on very limited mainstream branches to develop, test and deployment.
Of course, we should still follow Apache way to allow any interested committer for rolling
up his/her own release given specific requirement over the mainstream releases.

I am not biased on option A or B (I will discuss this later), but I think a bridge release
for upgrading to and back from 3.x is very necessary.
The reasons are obviously:
1. Given lesson learned from previous experience of migration from 1.x to 2.x, no matter how
careful we tend to be, there is still chance that some level of compatibility (source, binary,
configuration, etc.) get broken for the migration to new major release. Some of these incompatibilities
can only be identified in runtime after GA release with widely deployed in production cluster
- we have tons of downstream projects and numerous configurations and we cannot cover them
all from in-house deployment and test.

Source and binary compatibility are not required for 3.0.0. It's a new major release, and
there are known, documented incompatibilities in this regard.

That said, we've done far, far more in this regard compared to previous major or minor releases.
We've compiled all of CDH against Hadoop 3 and run our suite of system tests for the platform.
We've been testing in this way since 3.0.0-alpha1 and found and fixed plenty of source and
binary compatibility issues during the alpha and beta process. Many of these fixes trickled
down into 2.8 and 2.9.

2. From recent classpath isolation work, I was surprised to find out that many of our downstream
projects (HBase, Tez, etc.) are still consuming many non-public, server side APIs of Hadoop,
not saying the projects/products outside of hadoop ecosystem. Our API compatibility test does
not (and should not) cover these cases and situations. We can claim that new major release
shouldn't be responsible for these private API changes. But given the possibility of breaking
existing applications in some way, users could be very hesitated to migrate to 3.x release
if there is no safe solution to roll back.

This is true for 2.x releases as well. Similar to the previous answer, we've compiled all
of CDH against Hadoop 3, providing a much higher level of assurance even compared to 2.x releases.

3. Beside incompatibilities, there is also possible to have performance regressions (lower
throughput, higher latency, slower job running, bigger memory footprint or even memory leaking,
etc.) for new hadoop releases. While the performance impact of migration (if any) could be
neglectable to some users, other users could be very sensitive and wish to roll back if it
happens on their production cluster.

Yes, bugs exist. I won't claim that 3.0.0 is bug-free. All new releases can potentially introduce
new bugs.

However, I don't think rollback is the solution. In my experience, users rarely rollback since
it's so disruptive and causes data loss. It's much more common that they patch and upgrade.
With that in mind, I'd rather we spend our effort on making 3.0.x high-quality vs. making
it easier to rollback.

The root of my concern in announcing a "bridge release" is that it discourages users from
upgrading to 3.0.0 until a bridge release is out. I strongly believe the level of quality
provided by 3.0.0 is at least equal to new 2.x minor releases, given our extended testing
and integration process, and we don't have bridge releases for 2.x.

This is why I asked for a list of known issues with 2.x -> 3.0 upgrades, that would necessitate
a bridge release. Arun raised a concern about NM rollback. Are there any other *known* issues?

Best,
Andrew
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message