hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayakumar B <vinayakum...@apache.org>
Subject Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts
Date Sat, 04 Jan 2020 15:49:04 GMT
Hi,
Sorry for the late reply,.
>>> To be exact, how can we better use the thirdparty repo? Looking at
HBase as an example, it looks like everything that are known to break a lot
after an update get shaded into the hbase-thirdparty artifact: guava,
netty, ... etc.
Is it the purpose to isolate these naughty dependencies?
Yes, shading is to isolate these naughty dependencies from downstream
classpath and have independent control on these upgrades without breaking
downstreams.

First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create the
protobuf shaded jar is ready to merge.

Please take a look if anyone interested, will be merged may be after two
days if no objections.

-Vinay


On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <weichiu@apache.org> wrote:

> Hi I am late to this but I am keen to understand more.
>
> To be exact, how can we better use the thirdparty repo? Looking at HBase
> as an example, it looks like everything that are known to break a lot after
> an update get shaded into the hbase-thirdparty artifact: guava, netty, ...
> etc.
> Is it the purpose to isolate these naughty dependencies?
>
> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vinayakumarb@apache.org>
> wrote:
>
>> Hi All,
>>
>> I have updated the PR as per @Owen O'Malley <owen.omalley@gmail.com>
>> 's suggestions.
>>
>>     i. Renamed the module to 'hadoop-shaded-protobuf37'
>>     ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37'
>>
>> Please review!!
>>
>> Thanks,
>> -Vinay
>>
>>
>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <palomino219@gmail.com>
>> wrote:
>>
>> > For HBase we have a separated repo for hbase-thirdparty
>> >
>> > https://github.com/apache/hbase-thirdparty
>> >
>> > We will publish the artifacts to nexus so we do not need to include
>> > binaries in our git repo, just add a dependency in the pom.
>> >
>> >
>> >
>> https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf
>> >
>> >
>> > And it has its own release cycles, only when there are special
>> requirements
>> > or we want to upgrade some of the dependencies. This is the vote thread
>> for
>> > the newest release, where we want to provide a shaded gson for jdk7.
>> >
>> >
>> >
>> https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E
>> >
>> >
>> > Thanks.
>> >
>> > Vinayakumar B <vinayakumarb@apache.org> 于2019年9月28日周六 上午1:28写道:
>> >
>> > > Please find replies inline.
>> > >
>> > > -Vinay
>> > >
>> > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <
>> owen.omalley@gmail.com>
>> > > wrote:
>> > >
>> > > > I'm very unhappy with this direction. In particular, I don't think
>> git
>> > is
>> > > > a good place for distribution of binary artifacts. Furthermore, the
>> PMC
>> > > > shouldn't be releasing anything without a release vote.
>> > > >
>> > > >
>> > > Proposed solution doesnt release any binaries in git. Its actually a
>> > > complete sub-project which follows entire release process, including
>> VOTE
>> > > in public. I have mentioned already that release process is similar to
>> > > hadoop.
>> > > To be specific, using the (almost) same script used in hadoop to
>> generate
>> > > artifacts, sign and deploy to staging repository. Please let me know
>> If I
>> > > am conveying anything wrong.
>> > >
>> > >
>> > > > I'd propose that we make a third party module that contains the
>> > *source*
>> > > > of the pom files to build the relocated jars. This should
>> absolutely be
>> > > > treated as a last resort for the mostly Google projects that
>> regularly
>> > > > break binary compatibility (eg. Protobuf & Guava).
>> > > >
>> > > >
>> > > Same has been implemented in the PR
>> > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and
>> let
>> > > me
>> > > know If I misunderstood. Yes, this is the last option we have AFAIK.
>> > >
>> > >
>> > > > In terms of naming, I'd propose something like:
>> > > >
>> > > > org.apache.hadoop.thirdparty.protobuf2_5
>> > > > org.apache.hadoop.thirdparty.guava28
>> > > >
>> > > > In particular, I think we absolutely need to include the version of
>> the
>> > > > underlying project. On the other hand, since we should not be
>> shading
>> > > > *everything* we can drop the leading com.google.
>> > > >
>> > > >
>> > > IMO, This naming convention is easy for identifying the underlying
>> > project,
>> > > but  it will be difficult to maintain going forward if underlying
>> project
>> > > versions changes. Since thirdparty module have its own releases, each
>> of
>> > > those release can be mapped to specific version of underlying project.
>> > Even
>> > > the binary artifact can include a MANIFEST with underlying project
>> > details
>> > > as per Steve's suggestion on HADOOP-13363.
>> > > That said, if you still prefer to have project number in artifact id,
>> it
>> > > can be done.
>> > >
>> > > The Hadoop project can make releases of  the thirdparty module:
>> > > >
>> > > > <dependency>
>> > > >   <groupId>org.apache.hadoop</groupId>
>> > > >   <artifactId>hadoop-thirdparty-protobuf25</artifactId>
>> > > >   <version>1.0</version>
>> > > > </dependency>
>> > > >
>> > > >
>> > > Note that the version has to be the hadoop thirdparty release number,
>> > which
>> > > > is part of why you need to have the underlying version in the
>> artifact
>> > > > name. These we can push to maven central as new releases from
>> Hadoop.
>> > > >
>> > > >
>> > > Exactly, same has been implemented in the PR. hadoop-thirdparty module
>> > have
>> > > its own releases. But in HADOOP Jira, thirdparty versions can be
>> > > differentiated using prefix "thirdparty-".
>> > >
>> > > Same solution is being followed in HBase. May be people involved in
>> HBase
>> > > can add some points here.
>> > >
>> > > Thoughts?
>> > > >
>> > > > .. Owen
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B <
>> vinayakumarb@apache.org
>> > >
>> > > > wrote:
>> > > >
>> > > >> Hi All,
>> > > >>
>> > > >>    I wanted to discuss about the separate repo for thirdparty
>> > > dependencies
>> > > >> which we need to shaded and include in Hadoop component's jars.
>> > > >>
>> > > >>    Apologies for the big text ahead, but this needs clear
>> > explanation!!
>> > > >>
>> > > >>    Right now most needed such dependency is protobuf. Protobuf
>> > > dependency
>> > > >> was not upgraded from 2.5.0 onwards with the fear that downstream
>> > > builds,
>> > > >> which depends on transitive dependency protobuf coming from
>> hadoop's
>> > > jars,
>> > > >> may fail with the upgrade. Apparently protobuf does not guarantee
>> > source
>> > > >> compatibility, though it guarantees wire compatibility between
>> > versions.
>> > > >> Because of this behavior, version upgrade may cause breakage in
>> known
>> > > and
>> > > >> unknown (private?) downstreams.
>> > > >>
>> > > >>    So to tackle this, we came up the following proposal in
>> > HADOOP-13363.
>> > > >>
>> > > >>    Luckily, As far as I know, no APIs, either public to user or
>> > between
>> > > >> Hadoop processes, is not directly using protobuf classes in
>> > signatures.
>> > > >> (If
>> > > >> any exist, please let us know).
>> > > >>
>> > > >>    Proposal:
>> > > >>    ------------
>> > > >>
>> > > >>    1. Create a artifact(s) which contains shaded dependencies.
All
>> > such
>> > > >> shading/relocation will be with known prefix
>> > > >> **org.apache.hadoop.thirdparty.**.
>> > > >>    2. Right now protobuf jar (ex:
>> > > o.a.h.thirdparty:hadoop-shaded-protobuf)
>> > > >> to start with, all **com.google.protobuf** classes will be
>> relocated
>> > as
>> > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**.
>> > > >>    3. Hadoop modules, which needs protobuf as dependency, will
add
>> > this
>> > > >> shaded artifact as dependency (ex:
>> > > >> o.a.h.thirdparty:hadoop-shaded-protobuf).
>> > > >>    4. All previous usages of "com.google.protobuf" will be
>> relocated
>> > to
>> > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code
and
>> > will
>> > > be
>> > > >> committed. Please note, this replacement is One-Time directly
in
>> > source
>> > > >> code, NOT during compile and package.
>> > > >>    5. Once all usages of "com.google.protobuf" is relocated, then
>> > hadoop
>> > > >> dont care about which version of original  "protobuf-java" is
in
>> > > >> dependency.
>> > > >>    6. Just keep "protobuf-java:2.5.0" in dependency tree not to
>> break
>> > > the
>> > > >> downstreams. But hadoop will be originally using the latest
>> protobuf
>> > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf".
>> > > >>
>> > > >>    7. Coming back to separate repo, Following are most appropriate
>> > > reasons
>> > > >> of keeping shaded dependency artifact in separate repo instead
of
>> > > >> submodule.
>> > > >>
>> > > >>       7a. These artifacts need not be built all the time. It needs
>> to
>> > be
>> > > >> built only when there is a change in the dependency version or
the
>> > build
>> > > >> process.
>> > > >>       7b. If added as "submodule in Hadoop repo",
>> > > maven-shade-plugin:shade
>> > > >> will execute only in package phase. That means, "mvn compile"
or
>> "mvn
>> > > >> test-compile" will not be failed as this artifact will not have
>> > > relocated
>> > > >> classes, instead it will have original classes, resulting in
>> > compilation
>> > > >> failure. Workaround, build thirdparty submodule first and exclude
>> > > >> "thirdparty" submodule in other executions. This will be a complex
>> > > process
>> > > >> compared to keeping in a separate repo.
>> > > >>
>> > > >>       7c. Separate repo, will be a subproject of Hadoop, using
the
>> > same
>> > > >> HADOOP jira project, with different versioning prefixed with
>> > > "thirdparty-"
>> > > >> (ex: thirdparty-1.0.0).
>> > > >>       7d. Separate will have same release process as Hadoop.
>> > > >>
>> > > >>     HADOOP-13363 (
>> https://issues.apache.org/jira/browse/HADOOP-13363)
>> > > is
>> > > >> an
>> > > >> umbrella jira tracking the changes to protobuf upgrade.
>> > > >>
>> > > >>     PR (https://github.com/apache/hadoop-thirdparty/pull/1) has
>> been
>> > > >> raised
>> > > >> for separate repo creation in (HADOOP-16595 (
>> > > >> https://issues.apache.org/jira/browse/HADOOP-16595)
>> > > >>
>> > > >>     Please provide your inputs for the proposal and review the
PR
>> to
>> > > >> proceed with the proposal.
>> > > >>
>> > > >>
>> > > >    -Thanks,
>> > > >>     Vinay
>> > > >>
>> > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli <
>> > > >> vinodkv@apache.org>
>> > > >> wrote:
>> > > >>
>> > > >> > Moving the thread to the dev lists.
>> > > >> >
>> > > >> > Thanks
>> > > >> > +Vinod
>> > > >> >
>> > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B <
>> > > vinayakumarb@apache.org>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > Thanks Marton,
>> > > >> > >
>> > > >> > > Current created 'hadoop-thirdparty' repo is empty right
now.
>> > > >> > > Whether to use that repo  for shaded artifact or not
will be
>> > > >> monitored in
>> > > >> > > HADOOP-13363 umbrella jira. Please feel free to join
the
>> > discussion.
>> > > >> > >
>> > > >> > > There is no existing codebase is being moved out of
hadoop
>> repo.
>> > So
>> > > I
>> > > >> > think
>> > > >> > > right now we are good to go.
>> > > >> > >
>> > > >> > > -Vinay
>> > > >> > >
>> > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <elek@apache.org>
>> > > wrote:
>> > > >> > >
>> > > >> > >>
>> > > >> > >> I am not sure if it's defined when is a vote required.
>> > > >> > >>
>> > > >> > >> https://www.apache.org/foundation/voting.html
>> > > >> > >>
>> > > >> > >> Personally I think it's a big enough change to send
a
>> > notification
>> > > to
>> > > >> > the
>> > > >> > >> dev lists with a 'lazy consensus'  closure
>> > > >> > >>
>> > > >> > >> Marton
>> > > >> > >>
>> > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <
>> vinayakumarb@apache.org>
>> > > >> wrote:
>> > > >> > >>> Hi,
>> > > >> > >>>
>> > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar
(and may be
>> more
>> > in
>> > > >> > >> future)
>> > > >> > >>> will be kept as a shaded artifact in a separate
repo, which
>> will
>> > > be
>> > > >> > >>> referred as dependency in hadoop modules.  This
approach
>> avoids
>> > > >> shading
>> > > >> > >> of
>> > > >> > >>> every submodule during build.
>> > > >> > >>>
>> > > >> > >>> So question is does any VOTE required before
asking to
>> create a
>> > > git
>> > > >> > repo?
>> > > >> > >>>
>> > > >> > >>> On selfserve platform
>> > > https://gitbox.apache.org/setup/newrepo.html
>> > > >> > >>> I can access see that, requester should be PMC.
>> > > >> > >>>
>> > > >> > >>> Wanted to confirm here first.
>> > > >> > >>>
>> > > >> > >>> -Vinay
>> > > >> > >>>
>> > > >> > >>
>> > > >> > >>
>> > > ---------------------------------------------------------------------
>> > > >> > >> To unsubscribe, e-mail: private-unsubscribe@hadoop.apache.org
>> > > >> > >> For additional commands, e-mail:
>> private-help@hadoop.apache.org
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message