aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: [PROPOSAL] Change java thrift code gen
Date Thu, 28 Jan 2016 00:51:26 GMT
Firstly - thanks for the clean organization and delineation of steps in
this change.  Top notch work!

Some of the performance improvements are very nice; and in a particularly
hot code path.  I will wager a guess that the majority of the savings is in
avoiding what amounts to copy constructors between mutable and immutable
types.  I further wager there are alternative approaches we could weigh to
achieve those performance improvements.  As an example - you note above
that we could provide a patch to Apache Thrift.  Depending how much
performance inspires our decision here, it will be prudent to evaluate
alternatives.

I think there are (at least) two major issues worth discussing - code
volume (which you note) and an increase in logical complexity.  This will
leave us with a bifurcation in code generation tooling (custom+swift for
Java, Apache Thrift for python and js).  It's difficult to quantify the
downside of that, but it seems like an unfortunate state with potential for
compatibility risks.


On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zmanji@apache.org> wrote:

> Some high level comments without looking at the code.
>
> I'm in favor from abandoning the thrift generated java code in favor of
> immutable objects. I think it is easier to reason about and will ensure we
> have less errors in our code. If I understand correctly, the ProtoBuf
> format does this by default, so there some precedent for this style of code
> generation already.
>
> I think using Facebook's swift is the best approach here. I would be
> hesitant to accept any custom code generation that involved us parsing
> thrift IDL files or thrift formats over the wire because I poses a very
> high maintenance burden.
>
> I also think generating the MyBatis mutable classes is superior to our
> current strategy of manually creating them.
>
> Finally, the performance improvements look fantastic. As an operator of a
> large cluster I am excited to see wholesale performance improvements as I
> am always concerned that my cluster is approaching the limits of what
> Aurora can handle safely.
>
> Overall, I think this change merits a serious discussion from all
> contributors.
>
> On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <jsirois@apache.org> wrote:
>
> > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <jsirois@apache.org> wrote:
> >
> > > Context: Aurora uses the official Apache Thrift compiler today plus a
> > > home-grown python code generator [1] for immutable "entity" (I*)
> > wrappers.
> > >
> > > The proposal is to switch from using the Apache Thrift code generator
> to
> > a
> > > home grown generator.  The proposal comes with a concrete example in
> the
> > > form of the actual RBs to effect this change:
> > > 1. A custom java thrift code generator:
> > > https://reviews.apache.org/r/42748/
> > > 2. A custom MyBatis binding code generator powered by 1 above:
> > > https://reviews.apache.org/r/42749/
> > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > https://reviews.apache.org/r/42756/
> > >
> > > Since the RBs are large, I wanted to provide some extra context on the
> > > idea at a higher level.  I provide rationale, pros and cons below for
> > those
> > > interested in the idea but wary of diving in on code review until the
> > idea
> > > itself passes a sniff test.
> > > Thanks in advance for your feedback - and if we get there - for your
> > > review effort.
> > >
> >
> > I just added wfarner and zmanji as reviewers for the 3 RBs above since
> > they've expressed direct interest.  Happy to add others, just speak up or
> > else just comment on the reviews as you see fit.
> > I'll formally only submit if 1st this email thread reaches consensus, and
> > second, reviews are approved.
> >
> > ==
> > >
> > > In the course of an initial run at creating a first-class REST-like
> > > scheduler interface [2] I came to the conclusion generating the json
> API
> > > from the thrift one might be a good path.  That idea has been scrapped
> > with
> > > community feedback, but an initial experiment in custom thrift code-gen
> > for
> > > java that accompanied that idea seemed worth pursuing for its own
> > > independent benefits, chief among these being 1st class immutable
> thrift
> > > structs and the ability to leverage thrift annotations.
> > >
> > > Immutability:
> > > The benefits of having an immutable by default data model are the
> > standard
> > > ones; namely, its trivial to reason about safety of concurrent
> operations
> > > on the data model, stability of collections containing data model
> > entities
> > > and it opens up straight-forward optimizations that are easy to reason
> > > about.
> > > An example optimization is caching hashCodes for the immutable thrift
> > > structs.  This was done after comparing jmh benchmarks run against
> master
> > > and then again against the proposal branch.  Perf was comparable -
> within
> > > 10% plus and minus depending on the benchmark, but with the
> optimization
> > > added many benchmarks showed pronounced improvement in the proposal
> > branch
> > > [3].  The optimization is clearly safe and was quick and easy to
> > > implement.  Further optimizations can be experimented with in a
> > > straightforward way.
> > >
> > > Thrift Annotations:
> > > The thrift IDL grammar has supported these for quite some time, but
> they
> > > are not plumbed to the generated java code.  Uses are many and
> varied.  I
> > > initially had my eye on annotation of thrift services with REST verbs,
> > > routes, etc - but immediately we can leverage these annotations to kill
> > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code that
> > > needs to be maintained.
> > >
> > > There are a few downsides to switching to our own java thrift code gen:
> > > 1. We own more code to maintain:  Even though we have the custom python
> > > "immutable" wrapper generator [1] today, this new generator - even with
> > the
> > > python generator removed - represents a 5-6x increase in line count of
> > > custom code (~4.1k lines of code and tests in the new custom gen, ~700
> > > lines in the existing python custom gen)
> > > 2. We conceptually fork from a sibling Apache project.
> > >
> > > The fork could be mitigated by turning our real experience iterating
> the
> > > custom code generator into a well-founded patch back into the Apache
> > Thrift
> > > project, but saying we'll do that is easier than following through and
> > > actually doing it.
> > >
> > > ==
> > > Review guide / details:
> > >
> > > The technology stack:
> > > The thrift IDL parsing and thrift wire parsing are both handled by the
> > > Facebook swift project [4].  We only implement the middle bit that
> > > generates java code stubs.  This gives higher confidence that the
> tricky
> > > bits out at either edge are done right.
> > > The thrift struct code generation is done using Square's javapoet [5]
> in
> > > favor of templating for the purpose of easier to read generator code.
> > This
> > > characterization is debatable though and template are certainly more
> > > flexible the minute you need to gen a second language (say we like this
> > and
> > > want to do javascript codegen this way too for example).
> > > The MyBatis codegen is forced by the thrift codegen for technical
> > > reasons.  In short, there is no simple way to teach MyBatis to read and
> > > write immutable objects with builders.  So the MyBatis code is
> generated
> > > via an annotation processor that runs after thrift code gen, but
> reading
> > > thrift annotations that survive that codegen process.
> > > The codegen unit testing is done with the help of Google's
> compile-tester
> > > [6].  NB that this has an expected output comparison that checks the
> > > generated AST and not the text, so its fairly lenient.  Whitepsace and
> > > comments certainly don't matter.
> > >
> > > Review strategy:
> > > The code generator RBs (#1 & #2 in the 3 part series) are probably
> easier
> > > to review looking at samples of the generated code.  Both the thrift
> > > codegen and MyBatis codegen samples are conveniently contained in the
> > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).  The
> unit
> > > test uses resource files that contain both the thrift codegen inputs
> the
> > > annotation processor runs over and the annotation processor expected
> > > outputs  - the MyBatis peer classes.  So have a look there if you need
> > > something concrete and don't want to patch the RBs in and actually run
> > the
> > > codegen (`./gradlew api:compileJava`).
> > > The conversion RB (#3) is large but the changes are mainly mechanical
> > > conversions from the current mutable thrift + I* wrappers to pure
> > immutable
> > > thrift mutated via `.toBuilder` and `.with`'er methods.  The main
> changes
> > > of note are to the portions of the codebase tightly tied to thrift as a
> > > technology:
> > > + Gson/thrift converters
> > > + Shiro annotated auth param interception
> > > + Thrift/Servlet binding
> > >
> > > [1]
> > >
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > [3]
> > >
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > [4] https://github.com/facebook/swift
> > > [5] https://github.com/square/javapoet
> > > [6] https://github.com/google/compile-testing
> > >
> >
> > --
> > Zameer Manji
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message