aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sirois <>
Subject [PROPOSAL] Change java thrift code gen
Date Wed, 27 Jan 2016 03:47:11 GMT
Context: Aurora uses the official Apache Thrift compiler today plus a
home-grown python code generator [1] for immutable "entity" (I*) wrappers.

The proposal is to switch from using the Apache Thrift code generator to a
home grown generator.  The proposal comes with a concrete example in the
form of the actual RBs to effect this change:
1. A custom java thrift code generator:
2. A custom MyBatis binding code generator powered by 1 above:
3. Integration of 1 & 2 above into the Aurora codebase:

Since the RBs are large, I wanted to provide some extra context on the idea
at a higher level.  I provide rationale, pros and cons below for those
interested in the idea but wary of diving in on code review until the idea
itself passes a sniff test.
Thanks in advance for your feedback - and if we get there - for your review

In the course of an initial run at creating a first-class REST-like
scheduler interface [2] I came to the conclusion generating the json API
from the thrift one might be a good path.  That idea has been scrapped with
community feedback, but an initial experiment in custom thrift code-gen for
java that accompanied that idea seemed worth pursuing for its own
independent benefits, chief among these being 1st class immutable thrift
structs and the ability to leverage thrift annotations.

The benefits of having an immutable by default data model are the standard
ones; namely, its trivial to reason about safety of concurrent operations
on the data model, stability of collections containing data model entities
and it opens up straight-forward optimizations that are easy to reason
An example optimization is caching hashCodes for the immutable thrift
structs.  This was done after comparing jmh benchmarks run against master
and then again against the proposal branch.  Perf was comparable - within
10% plus and minus depending on the benchmark, but with the optimization
added many benchmarks showed pronounced improvement in the proposal branch
[3].  The optimization is clearly safe and was quick and easy to
implement.  Further optimizations can be experimented with in a
straightforward way.

Thrift Annotations:
The thrift IDL grammar has supported these for quite some time, but they
are not plumbed to the generated java code.  Uses are many and varied.  I
initially had my eye on annotation of thrift services with REST verbs,
routes, etc - but immediately we can leverage these annotations to kill
AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code that
needs to be maintained.

There are a few downsides to switching to our own java thrift code gen:
1. We own more code to maintain:  Even though we have the custom python
"immutable" wrapper generator [1] today, this new generator - even with the
python generator removed - represents a 5-6x increase in line count of
custom code (~4.1k lines of code and tests in the new custom gen, ~700
lines in the existing python custom gen)
2. We conceptually fork from a sibling Apache project.

The fork could be mitigated by turning our real experience iterating the
custom code generator into a well-founded patch back into the Apache Thrift
project, but saying we'll do that is easier than following through and
actually doing it.

Review guide / details:

The technology stack:
The thrift IDL parsing and thrift wire parsing are both handled by the
Facebook swift project [4].  We only implement the middle bit that
generates java code stubs.  This gives higher confidence that the tricky
bits out at either edge are done right.
The thrift struct code generation is done using Square's javapoet [5] in
favor of templating for the purpose of easier to read generator code.  This
characterization is debatable though and template are certainly more
flexible the minute you need to gen a second language (say we like this and
want to do javascript codegen this way too for example).
The MyBatis codegen is forced by the thrift codegen for technical reasons.
In short, there is no simple way to teach MyBatis to read and write
immutable objects with builders.  So the MyBatis code is generated via an
annotation processor that runs after thrift code gen, but reading thrift
annotations that survive that codegen process.
The codegen unit testing is done with the help of Google's compile-tester
[6].  NB that this has an expected output comparison that checks the
generated AST and not the text, so its fairly lenient.  Whitepsace and
comments certainly don't matter.

Review strategy:
The code generator RBs (#1 & #2 in the 3 part series) are probably easier
to review looking at samples of the generated code.  Both the thrift
codegen and MyBatis codegen samples are conveniently contained in the
MyBatis codegen RB (#2:  The unit
test uses resource files that contain both the thrift codegen inputs the
annotation processor runs over and the annotation processor expected
outputs  - the MyBatis peer classes.  So have a look there if you need
something concrete and don't want to patch the RBs in and actually run the
codegen (`./gradlew api:compileJava`).
The conversion RB (#3) is large but the changes are mainly mechanical
conversions from the current mutable thrift + I* wrappers to pure immutable
thrift mutated via `.toBuilder` and `.with`'er methods.  The main changes
of note are to the portions of the codebase tightly tied to thrift as a
+ Gson/thrift converters
+ Shiro annotated auth param interception
+ Thrift/Servlet binding


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message