aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sirois <>
Subject Re: [PROPOSAL] Change java thrift code gen
Date Wed, 27 Jan 2016 05:19:55 GMT
On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <> wrote:

> Context: Aurora uses the official Apache Thrift compiler today plus a
> home-grown python code generator [1] for immutable "entity" (I*) wrappers.
> The proposal is to switch from using the Apache Thrift code generator to a
> home grown generator.  The proposal comes with a concrete example in the
> form of the actual RBs to effect this change:
> 1. A custom java thrift code generator:
> 2. A custom MyBatis binding code generator powered by 1 above:
> 3. Integration of 1 & 2 above into the Aurora codebase:
> Since the RBs are large, I wanted to provide some extra context on the
> idea at a higher level.  I provide rationale, pros and cons below for those
> interested in the idea but wary of diving in on code review until the idea
> itself passes a sniff test.
> Thanks in advance for your feedback - and if we get there - for your
> review effort.

I just added wfarner and zmanji as reviewers for the 3 RBs above since
they've expressed direct interest.  Happy to add others, just speak up or
else just comment on the reviews as you see fit.
I'll formally only submit if 1st this email thread reaches consensus, and
second, reviews are approved.

> In the course of an initial run at creating a first-class REST-like
> scheduler interface [2] I came to the conclusion generating the json API
> from the thrift one might be a good path.  That idea has been scrapped with
> community feedback, but an initial experiment in custom thrift code-gen for
> java that accompanied that idea seemed worth pursuing for its own
> independent benefits, chief among these being 1st class immutable thrift
> structs and the ability to leverage thrift annotations.
> Immutability:
> The benefits of having an immutable by default data model are the standard
> ones; namely, its trivial to reason about safety of concurrent operations
> on the data model, stability of collections containing data model entities
> and it opens up straight-forward optimizations that are easy to reason
> about.
> An example optimization is caching hashCodes for the immutable thrift
> structs.  This was done after comparing jmh benchmarks run against master
> and then again against the proposal branch.  Perf was comparable - within
> 10% plus and minus depending on the benchmark, but with the optimization
> added many benchmarks showed pronounced improvement in the proposal branch
> [3].  The optimization is clearly safe and was quick and easy to
> implement.  Further optimizations can be experimented with in a
> straightforward way.
> Thrift Annotations:
> The thrift IDL grammar has supported these for quite some time, but they
> are not plumbed to the generated java code.  Uses are many and varied.  I
> initially had my eye on annotation of thrift services with REST verbs,
> routes, etc - but immediately we can leverage these annotations to kill
> AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code that
> needs to be maintained.
> There are a few downsides to switching to our own java thrift code gen:
> 1. We own more code to maintain:  Even though we have the custom python
> "immutable" wrapper generator [1] today, this new generator - even with the
> python generator removed - represents a 5-6x increase in line count of
> custom code (~4.1k lines of code and tests in the new custom gen, ~700
> lines in the existing python custom gen)
> 2. We conceptually fork from a sibling Apache project.
> The fork could be mitigated by turning our real experience iterating the
> custom code generator into a well-founded patch back into the Apache Thrift
> project, but saying we'll do that is easier than following through and
> actually doing it.
> ==
> Review guide / details:
> The technology stack:
> The thrift IDL parsing and thrift wire parsing are both handled by the
> Facebook swift project [4].  We only implement the middle bit that
> generates java code stubs.  This gives higher confidence that the tricky
> bits out at either edge are done right.
> The thrift struct code generation is done using Square's javapoet [5] in
> favor of templating for the purpose of easier to read generator code.  This
> characterization is debatable though and template are certainly more
> flexible the minute you need to gen a second language (say we like this and
> want to do javascript codegen this way too for example).
> The MyBatis codegen is forced by the thrift codegen for technical
> reasons.  In short, there is no simple way to teach MyBatis to read and
> write immutable objects with builders.  So the MyBatis code is generated
> via an annotation processor that runs after thrift code gen, but reading
> thrift annotations that survive that codegen process.
> The codegen unit testing is done with the help of Google's compile-tester
> [6].  NB that this has an expected output comparison that checks the
> generated AST and not the text, so its fairly lenient.  Whitepsace and
> comments certainly don't matter.
> Review strategy:
> The code generator RBs (#1 & #2 in the 3 part series) are probably easier
> to review looking at samples of the generated code.  Both the thrift
> codegen and MyBatis codegen samples are conveniently contained in the
> MyBatis codegen RB (#2:  The unit
> test uses resource files that contain both the thrift codegen inputs the
> annotation processor runs over and the annotation processor expected
> outputs  - the MyBatis peer classes.  So have a look there if you need
> something concrete and don't want to patch the RBs in and actually run the
> codegen (`./gradlew api:compileJava`).
> The conversion RB (#3) is large but the changes are mainly mechanical
> conversions from the current mutable thrift + I* wrappers to pure immutable
> thrift mutated via `.toBuilder` and `.with`'er methods.  The main changes
> of note are to the portions of the codebase tightly tied to thrift as a
> technology:
> + Gson/thrift converters
> + Shiro annotated auth param interception
> + Thrift/Servlet binding
> [1]
> [2]
> [3]
> [4]
> [5]
> [6]

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message