hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-16621) [pb-upgrade] spark-hive doesn't compile against hadoop trunk because of Token's marshalling
Date Wed, 18 Dec 2019 19:01:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999436#comment-16999436

Vinayakumar B commented on HADOOP-16621:

Sorry for being late.

Following public APIs were introduced by HADOOP-12563 back in 2016 in Hadoop 3.0.0 version.
public Token(TokenProto tokenPB);

public TokenProto toTokenProto();
Ideally there should not be any public API with @Public interface with protobuf in signature.
 Right now, this is breaking the binary compatibility of downstream due to protobuf version
upgrade. Because generated proto classes' super class name is changed to {{GeneratedMessage3}}
from {{GeneratedMessage}} in 2.5.0 protobuf.

So possible options to proceed will be only
 # Remove all public methods with protobuf signature replace with helper classes to do the
same job. as being done in HDFS' {{PBHelperClient.java}}. This will break the compatibility
if by any chance these methods are being used outside hadoop-common module (also Hadoop project
overall, as upgrade happens all Hadoop components together).
 # Mark methods deprecated, Keep the old 'TokenProto' class with 2.5.0 generated protobuf
committed to repo. And rename current {{TokenProto}} to {{TokenProto3}} and all their occurances
throughout project (Hopefully TokenProto is not used outside Hadoop project). And skip shading
of 2.5.0 TokenProto. Can remove methods and committed TokenProto class.


Approach #1 is would be easy and direct change, but again compatibility issue if these methods
used by other projects which is most unlikely.

[~stevel@apache.org] / [~vinodkv] / [~raviprak] is it okay to remove above mentioned methods
? and replace with something similar to {{PBHelperClient#convert(Token<?> tok)}} and
{{PBHelperClient#convert(TokenProto tok)}}


Approach #2 is a workaround still keeping the Compatibility but unnecessary (most possibly
unused ) code will be present in repo.

 This change is very much mandatory to allow spark(and others, which just imports Token classes)
to compile/run successfully without need to explicitly set the protobuf version same as Hadoop.


Please let me know your opinions.

> [pb-upgrade] spark-hive doesn't compile against hadoop trunk because of Token's marshalling
> -------------------------------------------------------------------------------------------
>                 Key: HADOOP-16621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16621
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: common
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Priority: Major
> the move to protobuf 3.x stops spark building because Token has a method which returns
a protobuf, and now its returning some v3 types.
> if we want to isolate downstream code from protobuf changes, we need to move that marshalling
method from token and put in a helper class.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message