aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: [PROPOSAL] Change java thrift code gen
Date Thu, 28 Jan 2016 01:08:26 GMT
I am +1 to making immutable thrift objects solely based on perf numbers.

My biggest concern though is maintenance of a pretty intricate codebase,
especially when it comes to upgrading any of the frameworks involved.
Bill's suggestion to explore paths to make this a part of Apache Thrift
sounds great to me.

On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wfarner@apache.org> wrote:

> Tony - this would not be a technical fork so much as a spiritual fork.
> While we would have our own bugs to work out, the only upstream exposure
> would be IDL or wire format changes.
>
> On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <tdong@twitter.com.invalid>
> wrote:
>
> > Awesome performance numbers! I don't particularly know the logistics of
> > upstreaming a change like this, but optimistically I would suggest
> > upstreaming it to Apache Thrift if possible.
> >
> > As someone that maintains a fork of a thrift compiler(fork of scrooge), I
> > have to say that it's not that fun. There's a lot of custom code that
> needs
> > to be maintained and a bunch of work to rebase the code periodically.
> >
> >
> >
> > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wfarner@apache.org> wrote:
> >
> > > Firstly - thanks for the clean organization and delineation of steps in
> > > this change.  Top notch work!
> > >
> > > Some of the performance improvements are very nice; and in a
> particularly
> > > hot code path.  I will wager a guess that the majority of the savings
> is
> > in
> > > avoiding what amounts to copy constructors between mutable and
> immutable
> > > types.  I further wager there are alternative approaches we could weigh
> > to
> > > achieve those performance improvements.  As an example - you note above
> > > that we could provide a patch to Apache Thrift.  Depending how much
> > > performance inspires our decision here, it will be prudent to evaluate
> > > alternatives.
> > >
> > > I think there are (at least) two major issues worth discussing - code
> > > volume (which you note) and an increase in logical complexity.  This
> will
> > > leave us with a bifurcation in code generation tooling (custom+swift
> for
> > > Java, Apache Thrift for python and js).  It's difficult to quantify the
> > > downside of that, but it seems like an unfortunate state with potential
> > for
> > > compatibility risks.
> > >
> > >
> > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zmanji@apache.org>
> wrote:
> > >
> > > > Some high level comments without looking at the code.
> > > >
> > > > I'm in favor from abandoning the thrift generated java code in favor
> of
> > > > immutable objects. I think it is easier to reason about and will
> ensure
> > > we
> > > > have less errors in our code. If I understand correctly, the ProtoBuf
> > > > format does this by default, so there some precedent for this style
> of
> > > code
> > > > generation already.
> > > >
> > > > I think using Facebook's swift is the best approach here. I would be
> > > > hesitant to accept any custom code generation that involved us
> parsing
> > > > thrift IDL files or thrift formats over the wire because I poses a
> very
> > > > high maintenance burden.
> > > >
> > > > I also think generating the MyBatis mutable classes is superior to
> our
> > > > current strategy of manually creating them.
> > > >
> > > > Finally, the performance improvements look fantastic. As an operator
> > of a
> > > > large cluster I am excited to see wholesale performance improvements
> > as I
> > > > am always concerned that my cluster is approaching the limits of what
> > > > Aurora can handle safely.
> > > >
> > > > Overall, I think this change merits a serious discussion from all
> > > > contributors.
> > > >
> > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <jsirois@apache.org>
> > wrote:
> > > >
> > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <jsirois@apache.org>
> > > wrote:
> > > > >
> > > > > > Context: Aurora uses the official Apache Thrift compiler today
> > plus a
> > > > > > home-grown python code generator [1] for immutable "entity"
(I*)
> > > > > wrappers.
> > > > > >
> > > > > > The proposal is to switch from using the Apache Thrift code
> > generator
> > > > to
> > > > > a
> > > > > > home grown generator.  The proposal comes with a concrete example
> > in
> > > > the
> > > > > > form of the actual RBs to effect this change:
> > > > > > 1. A custom java thrift code generator:
> > > > > > https://reviews.apache.org/r/42748/
> > > > > > 2. A custom MyBatis binding code generator powered by 1 above:
> > > > > > https://reviews.apache.org/r/42749/
> > > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > > > > https://reviews.apache.org/r/42756/
> > > > > >
> > > > > > Since the RBs are large, I wanted to provide some extra context
> on
> > > the
> > > > > > idea at a higher level.  I provide rationale, pros and cons
below
> > for
> > > > > those
> > > > > > interested in the idea but wary of diving in on code review
until
> > the
> > > > > idea
> > > > > > itself passes a sniff test.
> > > > > > Thanks in advance for your feedback - and if we get there -
for
> > your
> > > > > > review effort.
> > > > > >
> > > > >
> > > > > I just added wfarner and zmanji as reviewers for the 3 RBs above
> > since
> > > > > they've expressed direct interest.  Happy to add others, just speak
> > up
> > > or
> > > > > else just comment on the reviews as you see fit.
> > > > > I'll formally only submit if 1st this email thread reaches
> consensus,
> > > and
> > > > > second, reviews are approved.
> > > > >
> > > > > ==
> > > > > >
> > > > > > In the course of an initial run at creating a first-class
> REST-like
> > > > > > scheduler interface [2] I came to the conclusion generating
the
> > json
> > > > API
> > > > > > from the thrift one might be a good path.  That idea has been
> > > scrapped
> > > > > with
> > > > > > community feedback, but an initial experiment in custom thrift
> > > code-gen
> > > > > for
> > > > > > java that accompanied that idea seemed worth pursuing for its
own
> > > > > > independent benefits, chief among these being 1st class immutable
> > > > thrift
> > > > > > structs and the ability to leverage thrift annotations.
> > > > > >
> > > > > > Immutability:
> > > > > > The benefits of having an immutable by default data model are
the
> > > > > standard
> > > > > > ones; namely, its trivial to reason about safety of concurrent
> > > > operations
> > > > > > on the data model, stability of collections containing data
model
> > > > > entities
> > > > > > and it opens up straight-forward optimizations that are easy
to
> > > reason
> > > > > > about.
> > > > > > An example optimization is caching hashCodes for the immutable
> > thrift
> > > > > > structs.  This was done after comparing jmh benchmarks run
> against
> > > > master
> > > > > > and then again against the proposal branch.  Perf was comparable
> -
> > > > within
> > > > > > 10% plus and minus depending on the benchmark, but with the
> > > > optimization
> > > > > > added many benchmarks showed pronounced improvement in the
> proposal
> > > > > branch
> > > > > > [3].  The optimization is clearly safe and was quick and easy
to
> > > > > > implement.  Further optimizations can be experimented with in
a
> > > > > > straightforward way.
> > > > > >
> > > > > > Thrift Annotations:
> > > > > > The thrift IDL grammar has supported these for quite some time,
> but
> > > > they
> > > > > > are not plumbed to the generated java code.  Uses are many and
> > > > varied.  I
> > > > > > initially had my eye on annotation of thrift services with REST
> > > verbs,
> > > > > > routes, etc - but immediately we can leverage these annotations
> to
> > > kill
> > > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding
> code
> > > that
> > > > > > needs to be maintained.
> > > > > >
> > > > > > There are a few downsides to switching to our own java thrift
> code
> > > gen:
> > > > > > 1. We own more code to maintain:  Even though we have the custom
> > > python
> > > > > > "immutable" wrapper generator [1] today, this new generator
-
> even
> > > with
> > > > > the
> > > > > > python generator removed - represents a 5-6x increase in line
> count
> > > of
> > > > > > custom code (~4.1k lines of code and tests in the new custom
gen,
> > > ~700
> > > > > > lines in the existing python custom gen)
> > > > > > 2. We conceptually fork from a sibling Apache project.
> > > > > >
> > > > > > The fork could be mitigated by turning our real experience
> > iterating
> > > > the
> > > > > > custom code generator into a well-founded patch back into the
> > Apache
> > > > > Thrift
> > > > > > project, but saying we'll do that is easier than following
> through
> > > and
> > > > > > actually doing it.
> > > > > >
> > > > > > ==
> > > > > > Review guide / details:
> > > > > >
> > > > > > The technology stack:
> > > > > > The thrift IDL parsing and thrift wire parsing are both handled
> by
> > > the
> > > > > > Facebook swift project [4].  We only implement the middle bit
> that
> > > > > > generates java code stubs.  This gives higher confidence that
the
> > > > tricky
> > > > > > bits out at either edge are done right.
> > > > > > The thrift struct code generation is done using Square's javapoet
> > [5]
> > > > in
> > > > > > favor of templating for the purpose of easier to read generator
> > code.
> > > > > This
> > > > > > characterization is debatable though and template are certainly
> > more
> > > > > > flexible the minute you need to gen a second language (say we
> like
> > > this
> > > > > and
> > > > > > want to do javascript codegen this way too for example).
> > > > > > The MyBatis codegen is forced by the thrift codegen for technical
> > > > > > reasons.  In short, there is no simple way to teach MyBatis
to
> read
> > > and
> > > > > > write immutable objects with builders.  So the MyBatis code
is
> > > > generated
> > > > > > via an annotation processor that runs after thrift code gen,
but
> > > > reading
> > > > > > thrift annotations that survive that codegen process.
> > > > > > The codegen unit testing is done with the help of Google's
> > > > compile-tester
> > > > > > [6].  NB that this has an expected output comparison that checks
> > the
> > > > > > generated AST and not the text, so its fairly lenient.
> Whitepsace
> > > and
> > > > > > comments certainly don't matter.
> > > > > >
> > > > > > Review strategy:
> > > > > > The code generator RBs (#1 & #2 in the 3 part series) are
> probably
> > > > easier
> > > > > > to review looking at samples of the generated code.  Both the
> > thrift
> > > > > > codegen and MyBatis codegen samples are conveniently contained
in
> > the
> > > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).
> The
> > > > unit
> > > > > > test uses resource files that contain both the thrift codegen
> > inputs
> > > > the
> > > > > > annotation processor runs over and the annotation processor
> > expected
> > > > > > outputs  - the MyBatis peer classes.  So have a look there if
you
> > > need
> > > > > > something concrete and don't want to patch the RBs in and
> actually
> > > run
> > > > > the
> > > > > > codegen (`./gradlew api:compileJava`).
> > > > > > The conversion RB (#3) is large but the changes are mainly
> > mechanical
> > > > > > conversions from the current mutable thrift + I* wrappers to
pure
> > > > > immutable
> > > > > > thrift mutated via `.toBuilder` and `.with`'er methods.  The
main
> > > > changes
> > > > > > of note are to the portions of the codebase tightly tied to
> thrift
> > > as a
> > > > > > technology:
> > > > > > + Gson/thrift converters
> > > > > > + Shiro annotated auth param interception
> > > > > > + Thrift/Servlet binding
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > > > > [3]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > > > > [4] https://github.com/facebook/swift
> > > > > > [5] https://github.com/square/javapoet
> > > > > > [6] https://github.com/google/compile-testing
> > > > > >
> > > > >
> > > > > --
> > > > > Zameer Manji
> > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message