hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brahma Reddy Battula <bra...@apache.org>
Subject Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts
Date Sun, 05 Jan 2020 17:17:59 GMT
 I just gone through previous discussions from jira (HADOOP-13363) and this
thread,As stack and Duo Zhang mentioned ,this artifact(instead of
thirdparty we can give shaded??) will be voted by PMC like below, won’t it
be fair??

https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E

one thought here:
May be we can unify ( we can incubation project for same ??) ? So, that all
projects can use same git repo for shaded artifacts??


Wanted to join for the discussion, so please let me know..


On Sun, 5 Jan 2020 at 7:33 AM, Sree Vaddi <sree_at_chess@yahoo.com.invalid>
wrote:

> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
> Project ? Or as a TLP ?
> Or as a new project definition ?
>
> The effort to streamline and put in an accepted standard for the
> dependencies that require shading,seems beyond the siloed efforts of
> hadoop, hbase, etc....
>
> I propose, we bring all the decision makers from all these artifacts in
> one room and decide best course of action.I am looking at, no projects
> should ever had to shade any artifacts except as an absolute necessary
> alternative.
>
>
> Thank you./Sree
>
>
>
>     On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
> vinayakumarb@apache.org> wrote:
>
>  Hi,
> Sorry for the late reply,.
> >>> To be exact, how can we better use the thirdparty repo? Looking at
> HBase as an example, it looks like everything that are known to break a lot
> after an update get shaded into the hbase-thirdparty artifact: guava,
> netty, ... etc.
> Is it the purpose to isolate these naughty dependencies?
> Yes, shading is to isolate these naughty dependencies from downstream
> classpath and have independent control on these upgrades without breaking
> downstreams.
>
> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
> protobuf shaded jar is ready to merge.
>
> Please take a look if anyone interested, will be merged may be after two
> days if no objections.
>
> -Vinay
>
>
> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <weichiu@apache.org>
> wrote:
>
> > Hi I am late to this but I am keen to understand more.
> >
> > To be exact, how can we better use the thirdparty repo? Looking at HBase
> > as an example, it looks like everything that are known to break a lot
> after
> > an update get shaded into the hbase-thirdparty artifact: guava, netty,
> ...
> > etc.
> > Is it the purpose to isolate these naughty dependencies?
> >
> > On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vinayakumarb@apache.org>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com>
> >> 's suggestions.
> >>
> >>    i. Renamed the module to 'hadoop-shaded-protobuf37'
> >>    ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
> >>
> >> Please review!!
> >>
> >> Thanks,
> >> -Vinay
> >>
> >>
> >> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <palomino219@gmail.com>
> >> wrote:
> >>
> >> > For HBase we have a separated repo for hbase-thirdparty
> >> >
> >> > https://github.com/apache/hbase-thirdparty
> >> >
> >> > We will publish the artifacts to nexus so we do not need to include
> >> > binaries in our git repo, just add a dependency in the pom.
> >> >
> >> >
> >> >
> >>
> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
> >> >
> >> >
> >> > And it has its own release cycles, only when there are special
> >> requirements
> >> > or we want to upgrade some of the dependencies. This is the vote
> thread
> >> for
> >> > the newest release, where we want to provide a shaded gson for jdk7.
> >> >
> >> >
> >> >
> >>
> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
> >> >
> >> >
> >> > Thanks.
> >> >
> >> > Vinayakumar B <vinayakumarb@apache.org> 于2019年9月28日周六
上午1:28写道:
> >> >
> >> > > Please find replies inline.
> >> > >
> >> > > -Vinay
> >> > >
> >> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
> >> owen.omalley@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I'm very unhappy with this direction. In particular, I don't
think
> >> git
> >> > is
> >> > > > a good place for distribution of binary artifacts. Furthermore,
> the
> >> PMC
> >> > > > shouldn't be releasing anything without a release vote.
> >> > > >
> >> > > >
> >> > > Proposed solution doesnt release any binaries in git. Its actually
a
> >> > > complete sub-project which follows entire release process, including
> >> VOTE
> >> > > in public. I have mentioned already that release process is similar
> to
> >> > > hadoop.
> >> > > To be specific, using the (almost) same script used in hadoop to
> >> generate
> >> > > artifacts, sign and deploy to staging repository. Please let me know
> >> If I
> >> > > am conveying anything wrong.
> >> > >
> >> > >
> >> > > > I'd propose that we make a third party module that contains the
> >> > *source*
> >> > > > of the pom files to build the relocated jars. This should
> >> absolutely be
> >> > > > treated as a last resort for the mostly Google projects that
> >> regularly
> >> > > > break binary compatibility (eg. Protobuf & Guava).
> >> > > >
> >> > > >
> >> > > Same has been implemented in the PR
> >> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check
> and
> >> let
> >> > > me
> >> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
> >> > >
> >> > >
> >> > > > In terms of naming, I'd propose something like:
> >> > > >
> >> > > > org.apache.hadoop.thirdparty.protobuf2_5
> >> > > > org.apache.hadoop.thirdparty.guava28
> >> > > >
> >> > > > In particular, I think we absolutely need to include the version
> of
> >> the
> >> > > > underlying project. On the other hand, since we should not be
> >> shading
> >> > > > *everything* we can drop the leading com.google.
> >> > > >
> >> > > >
> >> > > IMO, This naming convention is easy for identifying the underlying
> >> > project,
> >> > > but  it will be difficult to maintain going forward if underlying
> >> project
> >> > > versions changes. Since thirdparty module have its own releases,
> each
> >> of
> >> > > those release can be mapped to specific version of underlying
> project.
> >> > Even
> >> > > the binary artifact can include a MANIFEST with underlying project
> >> > details
> >> > > as per Steve's suggestion on HADOOP-13363.
> >> > > That said, if you still prefer to have project number in artifact
> id,
> >> it
> >> > > can be done.
> >> > >
> >> > > The Hadoop project can make releases of  the thirdparty module:
> >> > > >
> >> > > > <dependency>
> >> > > >  <groupId>org.apache.hadoop</groupId>
> >> > > >  <artifactId>hadoop-thirdparty-protobuf25</artifactId>
> >> > > >  <version>1.0</version>
> >> > > > </dependency>
> >> > > >
> >> > > >
> >> > > Note that the version has to be the hadoop thirdparty release
> number,
> >> > which
> >> > > > is part of why you need to have the underlying version in the
> >> artifact
> >> > > > name. These we can push to maven central as new releases from
> >> Hadoop.
> >> > > >
> >> > > >
> >> > > Exactly, same has been implemented in the PR. hadoop-thirdparty
> module
> >> > have
> >> > > its own releases. But in HADOOP Jira, thirdparty versions can be
> >> > > differentiated using prefix "thirdparty-".
> >> > >
> >> > > Same solution is being followed in HBase. May be people involved in
> >> HBase
> >> > > can add some points here.
> >> > >
> >> > > Thoughts?
> >> > > >
> >> > > > .. Owen
> >> > > >
> >> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
> >> vinayakumarb@apache.org
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > >> Hi All,
> >> > > >>
> >> > > >>    I wanted to discuss about the separate repo for thirdparty
> >> > > dependencies
> >> > > >> which we need to shaded and include in Hadoop component's
jars.
> >> > > >>
> >> > > >>    Apologies for the big text ahead, but this needs clear
> >> > explanation!!
> >> > > >>
> >> > > >>    Right now most needed such dependency is protobuf. Protobuf
> >> > > dependency
> >> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
> >> > > builds,
> >> > > >> which depends on transitive dependency protobuf coming from
> >> hadoop's
> >> > > jars,
> >> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
> >> > source
> >> > > >> compatibility, though it guarantees wire compatibility between
> >> > versions.
> >> > > >> Because of this behavior, version upgrade may cause breakage
in
> >> known
> >> > > and
> >> > > >> unknown (private?) downstreams.
> >> > > >>
> >> > > >>    So to tackle this, we came up the following proposal in
> >> > HADOOP-13363.
> >> > > >>
> >> > > >>    Luckily, As far as I know, no APIs, either public to user
or
> >> > between
> >> > > >> Hadoop processes, is not directly using protobuf classes
in
> >> > signatures.
> >> > > >> (If
> >> > > >> any exist, please let us know).
> >> > > >>
> >> > > >>    Proposal:
> >> > > >>    ------------
> >> > > >>
> >> > > >>    1. Create a artifact(s) which contains shaded dependencies.
> All
> >> > such
> >> > > >> shading/relocation will be with known prefix
> >> > > >> **org.apache.hadoop.thirdparty.**.
> >> > > >>    2. Right now protobuf jar (ex:
> >> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
> >> > > >> to start with, all **com.google.protobuf** classes will be
> >> relocated
> >> > as
> >> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
> >> > > >>    3. Hadoop modules, which needs protobuf as dependency,
will
> add
> >> > this
> >> > > >> shaded artifact as dependency (ex:
> >> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
> >> > > >>    4. All previous usages of "com.google.protobuf" will be
> >> relocated
> >> > to
> >> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the
code
> and
> >> > will
> >> > > be
> >> > > >> committed. Please note, this replacement is One-Time directly
in
> >> > source
> >> > > >> code, NOT during compile and package.
> >> > > >>    5. Once all usages of "com.google.protobuf" is relocated,
then
> >> > hadoop
> >> > > >> dont care about which version of original  "protobuf-java"
is in
> >> > > >> dependency.
> >> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree
not to
> >> break
> >> > > the
> >> > > >> downstreams. But hadoop will be originally using the latest
> >> protobuf
> >> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
> >> > > >>
> >> > > >>    7. Coming back to separate repo, Following are most
> appropriate
> >> > > reasons
> >> > > >> of keeping shaded dependency artifact in separate repo instead
of
> >> > > >> submodule.
> >> > > >>
> >> > > >>      7a. These artifacts need not be built all the time.
It needs
> >> to
> >> > be
> >> > > >> built only when there is a change in the dependency version
or
> the
> >> > build
> >> > > >> process.
> >> > > >>      7b. If added as "submodule in Hadoop repo",
> >> > > maven-shade-plugin:shade
> >> > > >> will execute only in package phase. That means, "mvn compile"
or
> >> "mvn
> >> > > >> test-compile" will not be failed as this artifact will not
have
> >> > > relocated
> >> > > >> classes, instead it will have original classes, resulting
in
> >> > compilation
> >> > > >> failure. Workaround, build thirdparty submodule first and
exclude
> >> > > >> "thirdparty" submodule in other executions. This will be
a
> complex
> >> > > process
> >> > > >> compared to keeping in a separate repo.
> >> > > >>
> >> > > >>      7c. Separate repo, will be a subproject of Hadoop, using
the
> >> > same
> >> > > >> HADOOP jira project, with different versioning prefixed with
> >> > > "thirdparty-"
> >> > > >> (ex: thirdparty-1.0.0).
> >> > > >>      7d. Separate will have same release process as Hadoop.
> >> > > >>
> >> > > >>    HADOOP-13363 (
> >> https://issues.apache.org/jira/browse/HADOOP-13363)
> >> > > is
> >> > > >> an
> >> > > >> umbrella jira tracking the changes to protobuf upgrade.
> >> > > >>
> >> > > >>    PR (https://github.com/apache/hadoop-thirdparty/pull/1)
has
> >> been
> >> > > >> raised
> >> > > >> for separate repo creation in (HADOOP-16595 (
> >> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
> >> > > >>
> >> > > >>    Please provide your inputs for the proposal and review
the PR
> >> to
> >> > > >> proceed with the proposal.
> >> > > >>
> >> > > >>
> >> > > >    -Thanks,
> >> > > >>    Vinay
> >> > > >>
> >> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli
<
> >> > > >> vinodkv@apache.org>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Moving the thread to the dev lists.
> >> > > >> >
> >> > > >> > Thanks
> >> > > >> > +Vinod
> >> > > >> >
> >> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
> >> > > vinayakumarb@apache.org>
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > Thanks Marton,
> >> > > >> > >
> >> > > >> > > Current created 'hadoop-thirdparty' repo is empty
right now.
> >> > > >> > > Whether to use that repo  for shaded artifact or
not will be
> >> > > >> monitored in
> >> > > >> > > HADOOP-13363 umbrella jira. Please feel free to
join the
> >> > discussion.
> >> > > >> > >
> >> > > >> > > There is no existing codebase is being moved out
of hadoop
> >> repo.
> >> > So
> >> > > I
> >> > > >> > think
> >> > > >> > > right now we are good to go.
> >> > > >> > >
> >> > > >> > > -Vinay
> >> > > >> > >
> >> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <
> elek@apache.org>
> >> > > wrote:
> >> > > >> > >
> >> > > >> > >>
> >> > > >> > >> I am not sure if it's defined when is a vote
required.
> >> > > >> > >>
> >> > > >> > >> https://www.apache.org/foundation/voting.html
> >> > > >> > >>
> >> > > >> > >> Personally I think it's a big enough change
to send a
> >> > notification
> >> > > to
> >> > > >> > the
> >> > > >> > >> dev lists with a 'lazy consensus'  closure
> >> > > >> > >>
> >> > > >> > >> Marton
> >> > > >> > >>
> >> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
> >> vinayakumarb@apache.org>
> >> > > >> wrote:
> >> > > >> > >>> Hi,
> >> > > >> > >>>
> >> > > >> > >>> As discussed in HADOOP-13363, protobuf
3.x jar (and may be
> >> more
> >> > in
> >> > > >> > >> future)
> >> > > >> > >>> will be kept as a shaded artifact in a
separate repo, which
> >> will
> >> > > be
> >> > > >> > >>> referred as dependency in hadoop modules.
 This approach
> >> avoids
> >> > > >> shading
> >> > > >> > >> of
> >> > > >> > >>> every submodule during build.
> >> > > >> > >>>
> >> > > >> > >>> So question is does any VOTE required before
asking to
> >> create a
> >> > > git
> >> > > >> > repo?
> >> > > >> > >>>
> >> > > >> > >>> On selfserve platform
> >> > > https://gitbox.apache.org/setup/newrepo.html
> >> > > >> > >>> I can access see that, requester should
be PMC.
> >> > > >> > >>>
> >> > > >> > >>> Wanted to confirm here first.
> >> > > >> > >>>
> >> > > >> > >>> -Vinay
> >> > > >> > >>>
> >> > > >> > >>
> >> > > >> > >>
> >> > >
> ---------------------------------------------------------------------
> >> > > >> > >> To unsubscribe, e-mail:
> private-unsubscribe@hadoop.apache.org
> >> > > >> > >> For additional commands, e-mail:
> >> private-help@hadoop.apache.org
> >> > > >> > >>
> >> > > >> > >>
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >

-- 



--Brahma Reddy Battula

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message