hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1871) We should eliminate writing *PBImpl code in YARN
Date Tue, 25 Mar 2014 12:41:15 GMT

    [ https://issues.apache.org/jira/browse/YARN-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946460#comment-13946460
] 

Wangda Tan commented on YARN-1871:
----------------------------------

Some possible methods to eliminate writing PBImpl source code in my head,
1. Using Java annotation processor (RetentionPolicy=SOURCE), an example is [google auto|https://github.com/google/auto]
project. We can put an annotation in record classes, like
{code}
@GeneratePBImpl (protoclass=“org.apache.hadoop.yarn.proto.YarnProtos.ApplicationIdProto”)
public abstract class ApplicationId {
   ...
}
{code}
Then we can implement a GeneratePBImpl annotation processor to generate PBImpl code when compiling.

2. Using ProtocolBuffer parser directly parsing .proto and generate PBImpl code
We can get message description, fields, types to get fields in .proto file and generate code
by using PB parser. But unfortunately, PB doesn’t provide a java-based parser, we need write
a c-based program using such parsers (see [issue-263|https://code.google.com/p/protobuf/issues/detail?id=263])

3. Similar to @AtMostOnce annotation, make the ser-de as a runtime behavior.
In this method, we don’t need generate PBImpl source code or classes, we can create an RetentionPolicy=RUNTIME
annotation processor, mark record classes, such as,

{code}
@RecordClass (protoclass=“org.apache.hadoop.yarn.proto.YarnProtos.ApplicationIdProto”)
public abstract class ApplicationId {
   ...
}
{code} 
Similar to  annotation, when we need serialize/deserialize this class, we will check if is
it a “record class” or not in runtime. If yes, we can simply use its getters/setters and
PB generated class (*Proto) doing serialization/deserialization.

Any other thoughts on this? Hope to get your ideas.

> We should eliminate writing *PBImpl code in YARN
> ------------------------------------------------
>
>                 Key: YARN-1871
>                 URL: https://issues.apache.org/jira/browse/YARN-1871
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api
>    Affects Versions: 2.4.0
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>
> Currently, We need write PBImpl classes one by one. After running "find . -name "*PBImpl*.java"
| xargs wc -l" under hadoop source code directory, we can see, there're more than 25,000 LOC.
I think we should improve this, which will be very helpful for YARN developers to make changes
for YARN protocols.
> There're only some limited patterns in current *PBImpl,
> * Simple types, like string, int32, float.
> * List<?> types
> * Map<?> types
> * Enum types
> Code generation should be enough to generate such PBImpl classes.
> Some other requirements are,
> * Leave other related code alone, like service implemention (e.g. ContainerManagerImpl).
> * (If possible) Forward compatibility, developpers can write their own PBImpl or genereate
them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message