hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brahma Reddy Battula <bra...@apache.org>
Subject Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts
Date Tue, 07 Jan 2020 15:52:55 GMT
Hi Sree vaddi,Owen,stack,Duo Zhang,

We can move forward based on your comments, just waiting for your
reply.Hope all of your comments answered..(unification we can think
parallel thread as Vinay mentioned).



On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vinayakumarb@apache.org>
wrote:

> Hi Sree,
>
> > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> > Or as a new project definition ?
> As already mentioned by Ayush, this will be a subproject of Hadoop.
> Releases will be voted by Hadoop PMC as per ASF process.
>
>
> > The effort to streamline and put in an accepted standard for the
> dependencies that require shading,
> > seems beyond the siloed efforts of hadoop, hbase, etc....
>
> >I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.
> > I am looking at, no projects should ever had to shade any artifacts
> except as an absolute necessary alternative.
>
> This is the ideal proposal for any project. But unfortunately some projects
> takes their own course based on need.
>
> In the current case of protobuf in Hadoop,
>     Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
> avoid downstream failures. Since Hadoop is a platform, its dependencies
> will get added to downstream projects' classpath. So any change in Hadoop's
> dependencies will directly affect downstreams. Hadoop strictly follows
> backward compatibility as far as possible.
>     Though protobuf provides wire compatibility b/w versions, it doesnt
> provide compatibility for generated sources.
>     Now, to support ARM protobuf upgrade is mandatory. Using shading
> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
> still have 2.5.0 protobuf (deprecated) for downstreams.
>
> This shading is necessary to have both versions of protobuf supported.
> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
> hadoop's internal usage).
> And this entire work to be done before 3.3.0 release.
>
> So, though its ideal to make a common approach for all projects, I suggest
> for Hadoop we can go ahead as per current approach.
> We can also start the parallel effort to address these problems in a
> separate discussion/proposal. Once the solution is available we can revisit
> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
> Ratis).
>
> -Vinay
>
> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ayushtkn@gmail.com> wrote:
>
> > Hey Sree
> >
> > > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > > Project ? Or as a TLP ?
> > > Or as a new project definition ?
> > >
> > A sub project of Apache Hadoop, having its own independent release
> cycles.
> > May be you can put this into the same column as ozone or as
> > submarine(couple of months ago).
> >
> > Unifying for all, seems interesting but each project is independent and
> has
> > its own limitations and way of thinking, I don't think it would be an
> easy
> > task to bring all on the same table and get them agree to a common stuff.
> >
> > I guess this has been into discussion since quite long, and there hasn't
> > been any other alternative suggested. Still we can hold up for a week, if
> > someone comes up with a better solution, else we can continue in the
> > present direction.
> >
> > -Ayush
> >
> >
> >
> > On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_chess@yahoo.com
> .invalid>
> > wrote:
> >
> > > apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> > > Project ? Or as a TLP ?
> > > Or as a new project definition ?
> > >
> > > The effort to streamline and put in an accepted standard for the
> > > dependencies that require shading,seems beyond the siloed efforts of
> > > hadoop, hbase, etc....
> > >
> > > I propose, we bring all the decision makers from all these artifacts in
> > > one room and decide best course of action.I am looking at, no projects
> > > should ever had to shade any artifacts except as an absolute necessary
> > > alternative.
> > >
> > >
> > > Thank you./Sree
> > >
> > >
> > >
> > >     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> > > vinayakumarb@apache.org> wrote:
> > >
> > >  Hi,
> > > Sorry for the late reply,.
> > > >>> To be exact, how can we better use the thirdparty repo? Looking
at
> > > HBase as an example, it looks like everything that are known to break a
> > lot
> > > after an update get shaded into the hbase-thirdparty artifact: guava,
> > > netty, ... etc.
> > > Is it the purpose to isolate these naughty dependencies?
> > > Yes, shading is to isolate these naughty dependencies from downstream
> > > classpath and have independent control on these upgrades without
> breaking
> > > downstreams.
> > >
> > > First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create
> > the
> > > protobuf shaded jar is ready to merge.
> > >
> > > Please take a look if anyone interested, will be merged may be after
> two
> > > days if no objections.
> > >
> > > -Vinay
> > >
> > >
> > > On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <weichiu@apache.org>
> > > wrote:
> > >
> > > > Hi I am late to this but I am keen to understand more.
> > > >
> > > > To be exact, how can we better use the thirdparty repo? Looking at
> > HBase
> > > > as an example, it looks like everything that are known to break a lot
> > > after
> > > > an update get shaded into the hbase-thirdparty artifact: guava,
> netty,
> > > ...
> > > > etc.
> > > > Is it the purpose to isolate these naughty dependencies?
> > > >
> > > > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <
> vinayakumarb@apache.org
> > >
> > > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com
> >
> > > >> 's suggestions.
> > > >>
> > > >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> > > >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> > > >>
> > > >> Please review!!
> > > >>
> > > >> Thanks,
> > > >> -Vinay
> > > >>
> > > >>
> > > >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <
> palomino219@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > For HBase we have a separated repo for hbase-thirdparty
> > > >> >
> > > >> > https://github.com/apache/hbase-thirdparty
> > > >> >
> > > >> > We will publish the artifacts to nexus so we do not need to
> include
> > > >> > binaries in our git repo, just add a dependency in the pom.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> > > >> >
> > > >> >
> > > >> > And it has its own release cycles, only when there are special
> > > >> requirements
> > > >> > or we want to upgrade some of the dependencies. This is the vote
> > > thread
> > > >> for
> > > >> > the newest release, where we want to provide a shaded gson for
> jdk7.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> > > >> >
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >> > Vinayakumar B <vinayakumarb@apache.org> 于2019年9月28日周六
上午1:28写道:
> > > >> >
> > > >> > > Please find replies inline.
> > > >> > >
> > > >> > > -Vinay
> > > >> > >
> > > >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> > > >> owen.omalley@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > I'm very unhappy with this direction. In particular,
I don't
> > think
> > > >> git
> > > >> > is
> > > >> > > > a good place for distribution of binary artifacts.
> Furthermore,
> > > the
> > > >> PMC
> > > >> > > > shouldn't be releasing anything without a release vote.
> > > >> > > >
> > > >> > > >
> > > >> > > Proposed solution doesnt release any binaries in git. Its
> > actually a
> > > >> > > complete sub-project which follows entire release process,
> > including
> > > >> VOTE
> > > >> > > in public. I have mentioned already that release process
is
> > similar
> > > to
> > > >> > > hadoop.
> > > >> > > To be specific, using the (almost) same script used in hadoop
to
> > > >> generate
> > > >> > > artifacts, sign and deploy to staging repository. Please
let me
> > know
> > > >> If I
> > > >> > > am conveying anything wrong.
> > > >> > >
> > > >> > >
> > > >> > > > I'd propose that we make a third party module that
contains
> the
> > > >> > *source*
> > > >> > > > of the pom files to build the relocated jars. This
should
> > > >> absolutely be
> > > >> > > > treated as a last resort for the mostly Google projects
that
> > > >> regularly
> > > >> > > > break binary compatibility (eg. Protobuf & Guava).
> > > >> > > >
> > > >> > > >
> > > >> > > Same has been implemented in the PR
> > > >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please
> check
> > > and
> > > >> let
> > > >> > > me
> > > >> > > know If I misunderstood. Yes, this is the last option we
have
> > AFAIK.
> > > >> > >
> > > >> > >
> > > >> > > > In terms of naming, I'd propose something like:
> > > >> > > >
> > > >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> > > >> > > > org.apache.hadoop.thirdparty.guava28
> > > >> > > >
> > > >> > > > In particular, I think we absolutely need to include
the
> version
> > > of
> > > >> the
> > > >> > > > underlying project. On the other hand, since we should
not be
> > > >> shading
> > > >> > > > *everything* we can drop the leading com.google.
> > > >> > > >
> > > >> > > >
> > > >> > > IMO, This naming convention is easy for identifying the
> underlying
> > > >> > project,
> > > >> > > but  it will be difficult to maintain going forward if
> underlying
> > > >> project
> > > >> > > versions changes. Since thirdparty module have its own releases,
> > > each
> > > >> of
> > > >> > > those release can be mapped to specific version of underlying
> > > project.
> > > >> > Even
> > > >> > > the binary artifact can include a MANIFEST with underlying
> project
> > > >> > details
> > > >> > > as per Steve's suggestion on HADOOP-13363.
> > > >> > > That said, if you still prefer to have project number in
> artifact
> > > id,
> > > >> it
> > > >> > > can be done.
> > > >> > >
> > > >> > > The Hadoop project can make releases of  the thirdparty
module:
> > > >> > > >
> > > >> > > > <dependency>
> > > >> > > >  <groupId>org.apache.hadoop</groupId>
> > > >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> > > >> > > >  <version>1.0</version>
> > > >> > > > </dependency>
> > > >> > > >
> > > >> > > >
> > > >> > > Note that the version has to be the hadoop thirdparty release
> > > number,
> > > >> > which
> > > >> > > > is part of why you need to have the underlying version
in the
> > > >> artifact
> > > >> > > > name. These we can push to maven central as new releases
from
> > > >> Hadoop.
> > > >> > > >
> > > >> > > >
> > > >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> > > module
> > > >> > have
> > > >> > > its own releases. But in HADOOP Jira, thirdparty versions
can be
> > > >> > > differentiated using prefix "thirdparty-".
> > > >> > >
> > > >> > > Same solution is being followed in HBase. May be people
involved
> > in
> > > >> HBase
> > > >> > > can add some points here.
> > > >> > >
> > > >> > > Thoughts?
> > > >> > > >
> > > >> > > > .. Owen
> > > >> > > >
> > > >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> > > >> vinayakumarb@apache.org
> > > >> > >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > >> Hi All,
> > > >> > > >>
> > > >> > > >>    I wanted to discuss about the separate repo
for thirdparty
> > > >> > > dependencies
> > > >> > > >> which we need to shaded and include in Hadoop component's
> jars.
> > > >> > > >>
> > > >> > > >>    Apologies for the big text ahead, but this needs
clear
> > > >> > explanation!!
> > > >> > > >>
> > > >> > > >>    Right now most needed such dependency is protobuf.
> Protobuf
> > > >> > > dependency
> > > >> > > >> was not upgraded from 2.5.0 onwards with the fear
that
> > downstream
> > > >> > > builds,
> > > >> > > >> which depends on transitive dependency protobuf
coming from
> > > >> hadoop's
> > > >> > > jars,
> > > >> > > >> may fail with the upgrade. Apparently protobuf
does not
> > guarantee
> > > >> > source
> > > >> > > >> compatibility, though it guarantees wire compatibility
> between
> > > >> > versions.
> > > >> > > >> Because of this behavior, version upgrade may cause
breakage
> in
> > > >> known
> > > >> > > and
> > > >> > > >> unknown (private?) downstreams.
> > > >> > > >>
> > > >> > > >>    So to tackle this, we came up the following
proposal in
> > > >> > HADOOP-13363.
> > > >> > > >>
> > > >> > > >>    Luckily, As far as I know, no APIs, either public
to user
> or
> > > >> > between
> > > >> > > >> Hadoop processes, is not directly using protobuf
classes in
> > > >> > signatures.
> > > >> > > >> (If
> > > >> > > >> any exist, please let us know).
> > > >> > > >>
> > > >> > > >>    Proposal:
> > > >> > > >>    ------------
> > > >> > > >>
> > > >> > > >>    1. Create a artifact(s) which contains shaded
> dependencies.
> > > All
> > > >> > such
> > > >> > > >> shading/relocation will be with known prefix
> > > >> > > >> **org.apache.hadoop.thirdparty.**.
> > > >> > > >>    2. Right now protobuf jar (ex:
> > > >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> > > >> > > >> to start with, all **com.google.protobuf** classes
will be
> > > >> relocated
> > > >> > as
> > > >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> > > >> > > >>    3. Hadoop modules, which needs protobuf as dependency,
> will
> > > add
> > > >> > this
> > > >> > > >> shaded artifact as dependency (ex:
> > > >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> > > >> > > >>    4. All previous usages of "com.google.protobuf"
will be
> > > >> relocated
> > > >> > to
> > > >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf"
in the
> code
> > > and
> > > >> > will
> > > >> > > be
> > > >> > > >> committed. Please note, this replacement is One-Time
directly
> > in
> > > >> > source
> > > >> > > >> code, NOT during compile and package.
> > > >> > > >>    5. Once all usages of "com.google.protobuf"
is relocated,
> > then
> > > >> > hadoop
> > > >> > > >> dont care about which version of original  "protobuf-java"
is
> > in
> > > >> > > >> dependency.
> > > >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency
tree not
> to
> > > >> break
> > > >> > > the
> > > >> > > >> downstreams. But hadoop will be originally using
the latest
> > > >> protobuf
> > > >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> > > >> > > >>
> > > >> > > >>    7. Coming back to separate repo, Following are
most
> > > appropriate
> > > >> > > reasons
> > > >> > > >> of keeping shaded dependency artifact in separate
repo
> instead
> > of
> > > >> > > >> submodule.
> > > >> > > >>
> > > >> > > >>      7a. These artifacts need not be built all
the time. It
> > needs
> > > >> to
> > > >> > be
> > > >> > > >> built only when there is a change in the dependency
version
> or
> > > the
> > > >> > build
> > > >> > > >> process.
> > > >> > > >>      7b. If added as "submodule in Hadoop repo",
> > > >> > > maven-shade-plugin:shade
> > > >> > > >> will execute only in package phase. That means,
"mvn compile"
> > or
> > > >> "mvn
> > > >> > > >> test-compile" will not be failed as this artifact
will not
> have
> > > >> > > relocated
> > > >> > > >> classes, instead it will have original classes,
resulting in
> > > >> > compilation
> > > >> > > >> failure. Workaround, build thirdparty submodule
first and
> > exclude
> > > >> > > >> "thirdparty" submodule in other executions. This
will be a
> > > complex
> > > >> > > process
> > > >> > > >> compared to keeping in a separate repo.
> > > >> > > >>
> > > >> > > >>      7c. Separate repo, will be a subproject of
Hadoop, using
> > the
> > > >> > same
> > > >> > > >> HADOOP jira project, with different versioning
prefixed with
> > > >> > > "thirdparty-"
> > > >> > > >> (ex: thirdparty-1.0.0).
> > > >> > > >>      7d. Separate will have same release process
as Hadoop.
> > > >> > > >>
> > > >> > > >>    HADOOP-13363 (
> > > >> https://issues.apache.org/jira/browse/HADOOP-13363)
> > > >> > > is
> > > >> > > >> an
> > > >> > > >> umbrella jira tracking the changes to protobuf
upgrade.
> > > >> > > >>
> > > >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1)
> has
> > > >> been
> > > >> > > >> raised
> > > >> > > >> for separate repo creation in (HADOOP-16595 (
> > > >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> > > >> > > >>
> > > >> > > >>    Please provide your inputs for the proposal
and review the
> > PR
> > > >> to
> > > >> > > >> proceed with the proposal.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >    -Thanks,
> > > >> > > >>    Vinay
> > > >> > > >>
> > > >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli
<
> > > >> > > >> vinodkv@apache.org>
> > > >> > > >> wrote:
> > > >> > > >>
> > > >> > > >> > Moving the thread to the dev lists.
> > > >> > > >> >
> > > >> > > >> > Thanks
> > > >> > > >> > +Vinod
> > > >> > > >> >
> > > >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar
B <
> > > >> > > vinayakumarb@apache.org>
> > > >> > > >> > wrote:
> > > >> > > >> > >
> > > >> > > >> > > Thanks Marton,
> > > >> > > >> > >
> > > >> > > >> > > Current created 'hadoop-thirdparty' repo
is empty right
> > now.
> > > >> > > >> > > Whether to use that repo  for shaded
artifact or not will
> > be
> > > >> > > >> monitored in
> > > >> > > >> > > HADOOP-13363 umbrella jira. Please feel
free to join the
> > > >> > discussion.
> > > >> > > >> > >
> > > >> > > >> > > There is no existing codebase is being
moved out of
> hadoop
> > > >> repo.
> > > >> > So
> > > >> > > I
> > > >> > > >> > think
> > > >> > > >> > > right now we are good to go.
> > > >> > > >> > >
> > > >> > > >> > > -Vinay
> > > >> > > >> > >
> > > >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton
Elek <
> > > elek@apache.org>
> > > >> > > wrote:
> > > >> > > >> > >
> > > >> > > >> > >>
> > > >> > > >> > >> I am not sure if it's defined when
is a vote required.
> > > >> > > >> > >>
> > > >> > > >> > >> https://www.apache.org/foundation/voting.html
> > > >> > > >> > >>
> > > >> > > >> > >> Personally I think it's a big enough
change to send a
> > > >> > notification
> > > >> > > to
> > > >> > > >> > the
> > > >> > > >> > >> dev lists with a 'lazy consensus'
 closure
> > > >> > > >> > >>
> > > >> > > >> > >> Marton
> > > >> > > >> > >>
> > > >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar
B <
> > > >> vinayakumarb@apache.org>
> > > >> > > >> wrote:
> > > >> > > >> > >>> Hi,
> > > >> > > >> > >>>
> > > >> > > >> > >>> As discussed in HADOOP-13363,
protobuf 3.x jar (and may
> > be
> > > >> more
> > > >> > in
> > > >> > > >> > >> future)
> > > >> > > >> > >>> will be kept as a shaded artifact
in a separate repo,
> > which
> > > >> will
> > > >> > > be
> > > >> > > >> > >>> referred as dependency in hadoop
modules.  This
> approach
> > > >> avoids
> > > >> > > >> shading
> > > >> > > >> > >> of
> > > >> > > >> > >>> every submodule during build.
> > > >> > > >> > >>>
> > > >> > > >> > >>> So question is does any VOTE
required before asking to
> > > >> create a
> > > >> > > git
> > > >> > > >> > repo?
> > > >> > > >> > >>>
> > > >> > > >> > >>> On selfserve platform
> > > >> > > https://gitbox.apache.org/setup/newrepo.html
> > > >> > > >> > >>> I can access see that, requester
should be PMC.
> > > >> > > >> > >>>
> > > >> > > >> > >>> Wanted to confirm here first.
> > > >> > > >> > >>>
> > > >> > > >> > >>> -Vinay
> > > >> > > >> > >>>
> > > >> > > >> > >>
> > > >> > > >> > >>
> > > >> > >
> > > ---------------------------------------------------------------------
> > > >> > > >> > >> To unsubscribe, e-mail:
> > > private-unsubscribe@hadoop.apache.org
> > > >> > > >> > >> For additional commands, e-mail:
> > > >> private-help@hadoop.apache.org
> > > >> > > >> > >>
> > > >> > > >> > >>
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> >
>
-- 



--Brahma Reddy Battula

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message