aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Dong <td...@twitter.com.INVALID>
Subject Re: [PROPOSAL] Change java thrift code gen
Date Thu, 28 Jan 2016 00:58:51 GMT
Awesome performance numbers! I don't particularly know the logistics of
upstreaming a change like this, but optimistically I would suggest
upstreaming it to Apache Thrift if possible.

As someone that maintains a fork of a thrift compiler(fork of scrooge), I
have to say that it's not that fun. There's a lot of custom code that needs
to be maintained and a bunch of work to rebase the code periodically.



On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wfarner@apache.org> wrote:

> Firstly - thanks for the clean organization and delineation of steps in
> this change.  Top notch work!
>
> Some of the performance improvements are very nice; and in a particularly
> hot code path.  I will wager a guess that the majority of the savings is in
> avoiding what amounts to copy constructors between mutable and immutable
> types.  I further wager there are alternative approaches we could weigh to
> achieve those performance improvements.  As an example - you note above
> that we could provide a patch to Apache Thrift.  Depending how much
> performance inspires our decision here, it will be prudent to evaluate
> alternatives.
>
> I think there are (at least) two major issues worth discussing - code
> volume (which you note) and an increase in logical complexity.  This will
> leave us with a bifurcation in code generation tooling (custom+swift for
> Java, Apache Thrift for python and js).  It's difficult to quantify the
> downside of that, but it seems like an unfortunate state with potential for
> compatibility risks.
>
>
> On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zmanji@apache.org> wrote:
>
> > Some high level comments without looking at the code.
> >
> > I'm in favor from abandoning the thrift generated java code in favor of
> > immutable objects. I think it is easier to reason about and will ensure
> we
> > have less errors in our code. If I understand correctly, the ProtoBuf
> > format does this by default, so there some precedent for this style of
> code
> > generation already.
> >
> > I think using Facebook's swift is the best approach here. I would be
> > hesitant to accept any custom code generation that involved us parsing
> > thrift IDL files or thrift formats over the wire because I poses a very
> > high maintenance burden.
> >
> > I also think generating the MyBatis mutable classes is superior to our
> > current strategy of manually creating them.
> >
> > Finally, the performance improvements look fantastic. As an operator of a
> > large cluster I am excited to see wholesale performance improvements as I
> > am always concerned that my cluster is approaching the limits of what
> > Aurora can handle safely.
> >
> > Overall, I think this change merits a serious discussion from all
> > contributors.
> >
> > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <jsirois@apache.org> wrote:
> >
> > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <jsirois@apache.org>
> wrote:
> > >
> > > > Context: Aurora uses the official Apache Thrift compiler today plus a
> > > > home-grown python code generator [1] for immutable "entity" (I*)
> > > wrappers.
> > > >
> > > > The proposal is to switch from using the Apache Thrift code generator
> > to
> > > a
> > > > home grown generator.  The proposal comes with a concrete example in
> > the
> > > > form of the actual RBs to effect this change:
> > > > 1. A custom java thrift code generator:
> > > > https://reviews.apache.org/r/42748/
> > > > 2. A custom MyBatis binding code generator powered by 1 above:
> > > > https://reviews.apache.org/r/42749/
> > > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > > https://reviews.apache.org/r/42756/
> > > >
> > > > Since the RBs are large, I wanted to provide some extra context on
> the
> > > > idea at a higher level.  I provide rationale, pros and cons below for
> > > those
> > > > interested in the idea but wary of diving in on code review until the
> > > idea
> > > > itself passes a sniff test.
> > > > Thanks in advance for your feedback - and if we get there - for your
> > > > review effort.
> > > >
> > >
> > > I just added wfarner and zmanji as reviewers for the 3 RBs above since
> > > they've expressed direct interest.  Happy to add others, just speak up
> or
> > > else just comment on the reviews as you see fit.
> > > I'll formally only submit if 1st this email thread reaches consensus,
> and
> > > second, reviews are approved.
> > >
> > > ==
> > > >
> > > > In the course of an initial run at creating a first-class REST-like
> > > > scheduler interface [2] I came to the conclusion generating the json
> > API
> > > > from the thrift one might be a good path.  That idea has been
> scrapped
> > > with
> > > > community feedback, but an initial experiment in custom thrift
> code-gen
> > > for
> > > > java that accompanied that idea seemed worth pursuing for its own
> > > > independent benefits, chief among these being 1st class immutable
> > thrift
> > > > structs and the ability to leverage thrift annotations.
> > > >
> > > > Immutability:
> > > > The benefits of having an immutable by default data model are the
> > > standard
> > > > ones; namely, its trivial to reason about safety of concurrent
> > operations
> > > > on the data model, stability of collections containing data model
> > > entities
> > > > and it opens up straight-forward optimizations that are easy to
> reason
> > > > about.
> > > > An example optimization is caching hashCodes for the immutable thrift
> > > > structs.  This was done after comparing jmh benchmarks run against
> > master
> > > > and then again against the proposal branch.  Perf was comparable -
> > within
> > > > 10% plus and minus depending on the benchmark, but with the
> > optimization
> > > > added many benchmarks showed pronounced improvement in the proposal
> > > branch
> > > > [3].  The optimization is clearly safe and was quick and easy to
> > > > implement.  Further optimizations can be experimented with in a
> > > > straightforward way.
> > > >
> > > > Thrift Annotations:
> > > > The thrift IDL grammar has supported these for quite some time, but
> > they
> > > > are not plumbed to the generated java code.  Uses are many and
> > varied.  I
> > > > initially had my eye on annotation of thrift services with REST
> verbs,
> > > > routes, etc - but immediately we can leverage these annotations to
> kill
> > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code
> that
> > > > needs to be maintained.
> > > >
> > > > There are a few downsides to switching to our own java thrift code
> gen:
> > > > 1. We own more code to maintain:  Even though we have the custom
> python
> > > > "immutable" wrapper generator [1] today, this new generator - even
> with
> > > the
> > > > python generator removed - represents a 5-6x increase in line count
> of
> > > > custom code (~4.1k lines of code and tests in the new custom gen,
> ~700
> > > > lines in the existing python custom gen)
> > > > 2. We conceptually fork from a sibling Apache project.
> > > >
> > > > The fork could be mitigated by turning our real experience iterating
> > the
> > > > custom code generator into a well-founded patch back into the Apache
> > > Thrift
> > > > project, but saying we'll do that is easier than following through
> and
> > > > actually doing it.
> > > >
> > > > ==
> > > > Review guide / details:
> > > >
> > > > The technology stack:
> > > > The thrift IDL parsing and thrift wire parsing are both handled by
> the
> > > > Facebook swift project [4].  We only implement the middle bit that
> > > > generates java code stubs.  This gives higher confidence that the
> > tricky
> > > > bits out at either edge are done right.
> > > > The thrift struct code generation is done using Square's javapoet [5]
> > in
> > > > favor of templating for the purpose of easier to read generator code.
> > > This
> > > > characterization is debatable though and template are certainly more
> > > > flexible the minute you need to gen a second language (say we like
> this
> > > and
> > > > want to do javascript codegen this way too for example).
> > > > The MyBatis codegen is forced by the thrift codegen for technical
> > > > reasons.  In short, there is no simple way to teach MyBatis to read
> and
> > > > write immutable objects with builders.  So the MyBatis code is
> > generated
> > > > via an annotation processor that runs after thrift code gen, but
> > reading
> > > > thrift annotations that survive that codegen process.
> > > > The codegen unit testing is done with the help of Google's
> > compile-tester
> > > > [6].  NB that this has an expected output comparison that checks the
> > > > generated AST and not the text, so its fairly lenient.  Whitepsace
> and
> > > > comments certainly don't matter.
> > > >
> > > > Review strategy:
> > > > The code generator RBs (#1 & #2 in the 3 part series) are probably
> > easier
> > > > to review looking at samples of the generated code.  Both the thrift
> > > > codegen and MyBatis codegen samples are conveniently contained in the
> > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).  The
> > unit
> > > > test uses resource files that contain both the thrift codegen inputs
> > the
> > > > annotation processor runs over and the annotation processor expected
> > > > outputs  - the MyBatis peer classes.  So have a look there if you
> need
> > > > something concrete and don't want to patch the RBs in and actually
> run
> > > the
> > > > codegen (`./gradlew api:compileJava`).
> > > > The conversion RB (#3) is large but the changes are mainly mechanical
> > > > conversions from the current mutable thrift + I* wrappers to pure
> > > immutable
> > > > thrift mutated via `.toBuilder` and `.with`'er methods.  The main
> > changes
> > > > of note are to the portions of the codebase tightly tied to thrift
> as a
> > > > technology:
> > > > + Gson/thrift converters
> > > > + Shiro annotated auth param interception
> > > > + Thrift/Servlet binding
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > > [3]
> > > >
> > >
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > > [4] https://github.com/facebook/swift
> > > > [5] https://github.com/square/javapoet
> > > > [6] https://github.com/google/compile-testing
> > > >
> > >
> > > --
> > > Zameer Manji
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message