aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: [PROPOSAL] Change java thrift code gen
Date Thu, 28 Jan 2016 01:00:44 GMT
Tony - this would not be a technical fork so much as a spiritual fork.
While we would have our own bugs to work out, the only upstream exposure
would be IDL or wire format changes.

On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <tdong@twitter.com.invalid>
wrote:

> Awesome performance numbers! I don't particularly know the logistics of
> upstreaming a change like this, but optimistically I would suggest
> upstreaming it to Apache Thrift if possible.
>
> As someone that maintains a fork of a thrift compiler(fork of scrooge), I
> have to say that it's not that fun. There's a lot of custom code that needs
> to be maintained and a bunch of work to rebase the code periodically.
>
>
>
> On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wfarner@apache.org> wrote:
>
> > Firstly - thanks for the clean organization and delineation of steps in
> > this change.  Top notch work!
> >
> > Some of the performance improvements are very nice; and in a particularly
> > hot code path.  I will wager a guess that the majority of the savings is
> in
> > avoiding what amounts to copy constructors between mutable and immutable
> > types.  I further wager there are alternative approaches we could weigh
> to
> > achieve those performance improvements.  As an example - you note above
> > that we could provide a patch to Apache Thrift.  Depending how much
> > performance inspires our decision here, it will be prudent to evaluate
> > alternatives.
> >
> > I think there are (at least) two major issues worth discussing - code
> > volume (which you note) and an increase in logical complexity.  This will
> > leave us with a bifurcation in code generation tooling (custom+swift for
> > Java, Apache Thrift for python and js).  It's difficult to quantify the
> > downside of that, but it seems like an unfortunate state with potential
> for
> > compatibility risks.
> >
> >
> > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zmanji@apache.org> wrote:
> >
> > > Some high level comments without looking at the code.
> > >
> > > I'm in favor from abandoning the thrift generated java code in favor of
> > > immutable objects. I think it is easier to reason about and will ensure
> > we
> > > have less errors in our code. If I understand correctly, the ProtoBuf
> > > format does this by default, so there some precedent for this style of
> > code
> > > generation already.
> > >
> > > I think using Facebook's swift is the best approach here. I would be
> > > hesitant to accept any custom code generation that involved us parsing
> > > thrift IDL files or thrift formats over the wire because I poses a very
> > > high maintenance burden.
> > >
> > > I also think generating the MyBatis mutable classes is superior to our
> > > current strategy of manually creating them.
> > >
> > > Finally, the performance improvements look fantastic. As an operator
> of a
> > > large cluster I am excited to see wholesale performance improvements
> as I
> > > am always concerned that my cluster is approaching the limits of what
> > > Aurora can handle safely.
> > >
> > > Overall, I think this change merits a serious discussion from all
> > > contributors.
> > >
> > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <jsirois@apache.org>
> wrote:
> > >
> > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <jsirois@apache.org>
> > wrote:
> > > >
> > > > > Context: Aurora uses the official Apache Thrift compiler today
> plus a
> > > > > home-grown python code generator [1] for immutable "entity" (I*)
> > > > wrappers.
> > > > >
> > > > > The proposal is to switch from using the Apache Thrift code
> generator
> > > to
> > > > a
> > > > > home grown generator.  The proposal comes with a concrete example
> in
> > > the
> > > > > form of the actual RBs to effect this change:
> > > > > 1. A custom java thrift code generator:
> > > > > https://reviews.apache.org/r/42748/
> > > > > 2. A custom MyBatis binding code generator powered by 1 above:
> > > > > https://reviews.apache.org/r/42749/
> > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > > > https://reviews.apache.org/r/42756/
> > > > >
> > > > > Since the RBs are large, I wanted to provide some extra context on
> > the
> > > > > idea at a higher level.  I provide rationale, pros and cons below
> for
> > > > those
> > > > > interested in the idea but wary of diving in on code review until
> the
> > > > idea
> > > > > itself passes a sniff test.
> > > > > Thanks in advance for your feedback - and if we get there - for
> your
> > > > > review effort.
> > > > >
> > > >
> > > > I just added wfarner and zmanji as reviewers for the 3 RBs above
> since
> > > > they've expressed direct interest.  Happy to add others, just speak
> up
> > or
> > > > else just comment on the reviews as you see fit.
> > > > I'll formally only submit if 1st this email thread reaches consensus,
> > and
> > > > second, reviews are approved.
> > > >
> > > > ==
> > > > >
> > > > > In the course of an initial run at creating a first-class REST-like
> > > > > scheduler interface [2] I came to the conclusion generating the
> json
> > > API
> > > > > from the thrift one might be a good path.  That idea has been
> > scrapped
> > > > with
> > > > > community feedback, but an initial experiment in custom thrift
> > code-gen
> > > > for
> > > > > java that accompanied that idea seemed worth pursuing for its own
> > > > > independent benefits, chief among these being 1st class immutable
> > > thrift
> > > > > structs and the ability to leverage thrift annotations.
> > > > >
> > > > > Immutability:
> > > > > The benefits of having an immutable by default data model are the
> > > > standard
> > > > > ones; namely, its trivial to reason about safety of concurrent
> > > operations
> > > > > on the data model, stability of collections containing data model
> > > > entities
> > > > > and it opens up straight-forward optimizations that are easy to
> > reason
> > > > > about.
> > > > > An example optimization is caching hashCodes for the immutable
> thrift
> > > > > structs.  This was done after comparing jmh benchmarks run against
> > > master
> > > > > and then again against the proposal branch.  Perf was comparable
-
> > > within
> > > > > 10% plus and minus depending on the benchmark, but with the
> > > optimization
> > > > > added many benchmarks showed pronounced improvement in the proposal
> > > > branch
> > > > > [3].  The optimization is clearly safe and was quick and easy to
> > > > > implement.  Further optimizations can be experimented with in a
> > > > > straightforward way.
> > > > >
> > > > > Thrift Annotations:
> > > > > The thrift IDL grammar has supported these for quite some time, but
> > > they
> > > > > are not plumbed to the generated java code.  Uses are many and
> > > varied.  I
> > > > > initially had my eye on annotation of thrift services with REST
> > verbs,
> > > > > routes, etc - but immediately we can leverage these annotations to
> > kill
> > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code
> > that
> > > > > needs to be maintained.
> > > > >
> > > > > There are a few downsides to switching to our own java thrift code
> > gen:
> > > > > 1. We own more code to maintain:  Even though we have the custom
> > python
> > > > > "immutable" wrapper generator [1] today, this new generator - even
> > with
> > > > the
> > > > > python generator removed - represents a 5-6x increase in line count
> > of
> > > > > custom code (~4.1k lines of code and tests in the new custom gen,
> > ~700
> > > > > lines in the existing python custom gen)
> > > > > 2. We conceptually fork from a sibling Apache project.
> > > > >
> > > > > The fork could be mitigated by turning our real experience
> iterating
> > > the
> > > > > custom code generator into a well-founded patch back into the
> Apache
> > > > Thrift
> > > > > project, but saying we'll do that is easier than following through
> > and
> > > > > actually doing it.
> > > > >
> > > > > ==
> > > > > Review guide / details:
> > > > >
> > > > > The technology stack:
> > > > > The thrift IDL parsing and thrift wire parsing are both handled by
> > the
> > > > > Facebook swift project [4].  We only implement the middle bit that
> > > > > generates java code stubs.  This gives higher confidence that the
> > > tricky
> > > > > bits out at either edge are done right.
> > > > > The thrift struct code generation is done using Square's javapoet
> [5]
> > > in
> > > > > favor of templating for the purpose of easier to read generator
> code.
> > > > This
> > > > > characterization is debatable though and template are certainly
> more
> > > > > flexible the minute you need to gen a second language (say we like
> > this
> > > > and
> > > > > want to do javascript codegen this way too for example).
> > > > > The MyBatis codegen is forced by the thrift codegen for technical
> > > > > reasons.  In short, there is no simple way to teach MyBatis to read
> > and
> > > > > write immutable objects with builders.  So the MyBatis code is
> > > generated
> > > > > via an annotation processor that runs after thrift code gen, but
> > > reading
> > > > > thrift annotations that survive that codegen process.
> > > > > The codegen unit testing is done with the help of Google's
> > > compile-tester
> > > > > [6].  NB that this has an expected output comparison that checks
> the
> > > > > generated AST and not the text, so its fairly lenient.  Whitepsace
> > and
> > > > > comments certainly don't matter.
> > > > >
> > > > > Review strategy:
> > > > > The code generator RBs (#1 & #2 in the 3 part series) are probably
> > > easier
> > > > > to review looking at samples of the generated code.  Both the
> thrift
> > > > > codegen and MyBatis codegen samples are conveniently contained in
> the
> > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).  The
> > > unit
> > > > > test uses resource files that contain both the thrift codegen
> inputs
> > > the
> > > > > annotation processor runs over and the annotation processor
> expected
> > > > > outputs  - the MyBatis peer classes.  So have a look there if you
> > need
> > > > > something concrete and don't want to patch the RBs in and actually
> > run
> > > > the
> > > > > codegen (`./gradlew api:compileJava`).
> > > > > The conversion RB (#3) is large but the changes are mainly
> mechanical
> > > > > conversions from the current mutable thrift + I* wrappers to pure
> > > > immutable
> > > > > thrift mutated via `.toBuilder` and `.with`'er methods.  The main
> > > changes
> > > > > of note are to the portions of the codebase tightly tied to thrift
> > as a
> > > > > technology:
> > > > > + Gson/thrift converters
> > > > > + Shiro annotated auth param interception
> > > > > + Thrift/Servlet binding
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > > > [3]
> > > > >
> > > >
> > >
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > > > [4] https://github.com/facebook/swift
> > > > > [5] https://github.com/square/javapoet
> > > > > [6] https://github.com/google/compile-testing
> > > > >
> > > >
> > > > --
> > > > Zameer Manji
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message