aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sirois <jsir...@apache.org>
Subject Re: [PROPOSAL] Change java thrift code gen
Date Thu, 28 Jan 2016 17:08:44 GMT
On Thu, Jan 28, 2016 at 8:26 AM, John Sirois <john@conductant.com> wrote:

> On Thu, Jan 28, 2016 at 6:45 AM, Jake Farrell <jfarrell@apache.org> wrote:
>
> > +1 to making this apart of Thrift, i'm happy to help shepard this on the
> > Thrift side and get it in as soon as its ready
> >
>
> I've filed https://issues.apache.org/jira/browse/THRIFT-3583 to use as the
> basis for discussion of this feature over in the Apache Thrift project.
> I looked at the problem a bit and noted some challenges.
>

And started a dev@thrift.apache.org thread here if anyone wants to follow
along or participate: http://markmail.org/message/mlxyyauyvlvuxjsf


>
>
> >
> > -Jake
> >
> > On Wed, Jan 27, 2016 at 8:08 PM, Maxim Khutornenko <maxim@apache.org>
> > wrote:
> >
> >> I am +1 to making immutable thrift objects solely based on perf numbers.
> >>
> >> My biggest concern though is maintenance of a pretty intricate codebase,
> >> especially when it comes to upgrading any of the frameworks involved.
> >> Bill's suggestion to explore paths to make this a part of Apache Thrift
> >> sounds great to me.
> >>
> >> On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wfarner@apache.org>
> wrote:
> >>
> >> > Tony - this would not be a technical fork so much as a spiritual fork.
> >> > While we would have our own bugs to work out, the only upstream
> exposure
> >> > would be IDL or wire format changes.
> >> >
> >> > On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <tdong@twitter.com.invalid
> >
> >> > wrote:
> >> >
> >> > > Awesome performance numbers! I don't particularly know the logistics
> >> of
> >> > > upstreaming a change like this, but optimistically I would suggest
> >> > > upstreaming it to Apache Thrift if possible.
> >> > >
> >> > > As someone that maintains a fork of a thrift compiler(fork of
> >> scrooge), I
> >> > > have to say that it's not that fun. There's a lot of custom code
> that
> >> > needs
> >> > > to be maintained and a bunch of work to rebase the code
> periodically.
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wfarner@apache.org>
> >> wrote:
> >> > >
> >> > > > Firstly - thanks for the clean organization and delineation of
> >> steps in
> >> > > > this change.  Top notch work!
> >> > > >
> >> > > > Some of the performance improvements are very nice; and in a
> >> > particularly
> >> > > > hot code path.  I will wager a guess that the majority of the
> >> savings
> >> > is
> >> > > in
> >> > > > avoiding what amounts to copy constructors between mutable and
> >> > immutable
> >> > > > types.  I further wager there are alternative approaches we could
> >> weigh
> >> > > to
> >> > > > achieve those performance improvements.  As an example - you
note
> >> above
> >> > > > that we could provide a patch to Apache Thrift.  Depending how
> much
> >> > > > performance inspires our decision here, it will be prudent to
> >> evaluate
> >> > > > alternatives.
> >> > > >
> >> > > > I think there are (at least) two major issues worth discussing
-
> >> code
> >> > > > volume (which you note) and an increase in logical complexity.
> This
> >> > will
> >> > > > leave us with a bifurcation in code generation tooling
> (custom+swift
> >> > for
> >> > > > Java, Apache Thrift for python and js).  It's difficult to
> quantify
> >> the
> >> > > > downside of that, but it seems like an unfortunate state with
> >> potential
> >> > > for
> >> > > > compatibility risks.
> >> > > >
> >> > > >
> >> > > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zmanji@apache.org>
> >> > wrote:
> >> > > >
> >> > > > > Some high level comments without looking at the code.
> >> > > > >
> >> > > > > I'm in favor from abandoning the thrift generated java code
in
> >> favor
> >> > of
> >> > > > > immutable objects. I think it is easier to reason about
and will
> >> > ensure
> >> > > > we
> >> > > > > have less errors in our code. If I understand correctly,
the
> >> ProtoBuf
> >> > > > > format does this by default, so there some precedent for
this
> >> style
> >> > of
> >> > > > code
> >> > > > > generation already.
> >> > > > >
> >> > > > > I think using Facebook's swift is the best approach here.
I
> would
> >> be
> >> > > > > hesitant to accept any custom code generation that involved
us
> >> > parsing
> >> > > > > thrift IDL files or thrift formats over the wire because
I
> poses a
> >> > very
> >> > > > > high maintenance burden.
> >> > > > >
> >> > > > > I also think generating the MyBatis mutable classes is superior
> to
> >> > our
> >> > > > > current strategy of manually creating them.
> >> > > > >
> >> > > > > Finally, the performance improvements look fantastic. As
an
> >> operator
> >> > > of a
> >> > > > > large cluster I am excited to see wholesale performance
> >> improvements
> >> > > as I
> >> > > > > am always concerned that my cluster is approaching the limits
of
> >> what
> >> > > > > Aurora can handle safely.
> >> > > > >
> >> > > > > Overall, I think this change merits a serious discussion
from
> all
> >> > > > > contributors.
> >> > > > >
> >> > > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <
> jsirois@apache.org>
> >> > > wrote:
> >> > > > >
> >> > > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <
> >> jsirois@apache.org>
> >> > > > wrote:
> >> > > > > >
> >> > > > > > > Context: Aurora uses the official Apache Thrift
compiler
> today
> >> > > plus a
> >> > > > > > > home-grown python code generator [1] for immutable
"entity"
> >> (I*)
> >> > > > > > wrappers.
> >> > > > > > >
> >> > > > > > > The proposal is to switch from using the Apache
Thrift code
> >> > > generator
> >> > > > > to
> >> > > > > > a
> >> > > > > > > home grown generator.  The proposal comes with
a concrete
> >> example
> >> > > in
> >> > > > > the
> >> > > > > > > form of the actual RBs to effect this change:
> >> > > > > > > 1. A custom java thrift code generator:
> >> > > > > > > https://reviews.apache.org/r/42748/
> >> > > > > > > 2. A custom MyBatis binding code generator powered
by 1
> above:
> >> > > > > > > https://reviews.apache.org/r/42749/
> >> > > > > > > 3. Integration of 1 & 2 above into the Aurora
codebase:
> >> > > > > > > https://reviews.apache.org/r/42756/
> >> > > > > > >
> >> > > > > > > Since the RBs are large, I wanted to provide some
extra
> >> context
> >> > on
> >> > > > the
> >> > > > > > > idea at a higher level.  I provide rationale,
pros and cons
> >> below
> >> > > for
> >> > > > > > those
> >> > > > > > > interested in the idea but wary of diving in on
code review
> >> until
> >> > > the
> >> > > > > > idea
> >> > > > > > > itself passes a sniff test.
> >> > > > > > > Thanks in advance for your feedback - and if we
get there -
> >> for
> >> > > your
> >> > > > > > > review effort.
> >> > > > > > >
> >> > > > > >
> >> > > > > > I just added wfarner and zmanji as reviewers for the
3 RBs
> above
> >> > > since
> >> > > > > > they've expressed direct interest.  Happy to add others,
just
> >> speak
> >> > > up
> >> > > > or
> >> > > > > > else just comment on the reviews as you see fit.
> >> > > > > > I'll formally only submit if 1st this email thread
reaches
> >> > consensus,
> >> > > > and
> >> > > > > > second, reviews are approved.
> >> > > > > >
> >> > > > > > ==
> >> > > > > > >
> >> > > > > > > In the course of an initial run at creating a
first-class
> >> > REST-like
> >> > > > > > > scheduler interface [2] I came to the conclusion
generating
> >> the
> >> > > json
> >> > > > > API
> >> > > > > > > from the thrift one might be a good path.  That
idea has
> been
> >> > > > scrapped
> >> > > > > > with
> >> > > > > > > community feedback, but an initial experiment
in custom
> thrift
> >> > > > code-gen
> >> > > > > > for
> >> > > > > > > java that accompanied that idea seemed worth pursuing
for
> its
> >> own
> >> > > > > > > independent benefits, chief among these being
1st class
> >> immutable
> >> > > > > thrift
> >> > > > > > > structs and the ability to leverage thrift annotations.
> >> > > > > > >
> >> > > > > > > Immutability:
> >> > > > > > > The benefits of having an immutable by default
data model
> are
> >> the
> >> > > > > > standard
> >> > > > > > > ones; namely, its trivial to reason about safety
of
> concurrent
> >> > > > > operations
> >> > > > > > > on the data model, stability of collections containing
data
> >> model
> >> > > > > > entities
> >> > > > > > > and it opens up straight-forward optimizations
that are easy
> >> to
> >> > > > reason
> >> > > > > > > about.
> >> > > > > > > An example optimization is caching hashCodes for
the
> immutable
> >> > > thrift
> >> > > > > > > structs.  This was done after comparing jmh benchmarks
run
> >> > against
> >> > > > > master
> >> > > > > > > and then again against the proposal branch.  Perf
was
> >> comparable
> >> > -
> >> > > > > within
> >> > > > > > > 10% plus and minus depending on the benchmark,
but with the
> >> > > > > optimization
> >> > > > > > > added many benchmarks showed pronounced improvement
in the
> >> > proposal
> >> > > > > > branch
> >> > > > > > > [3].  The optimization is clearly safe and was
quick and
> easy
> >> to
> >> > > > > > > implement.  Further optimizations can be experimented
with
> in
> >> a
> >> > > > > > > straightforward way.
> >> > > > > > >
> >> > > > > > > Thrift Annotations:
> >> > > > > > > The thrift IDL grammar has supported these for
quite some
> >> time,
> >> > but
> >> > > > > they
> >> > > > > > > are not plumbed to the generated java code.  Uses
are many
> and
> >> > > > > varied.  I
> >> > > > > > > initially had my eye on annotation of thrift services
with
> >> REST
> >> > > > verbs,
> >> > > > > > > routes, etc - but immediately we can leverage
these
> >> annotations
> >> > to
> >> > > > kill
> >> > > > > > > AnnotatedAuroraAdmin and reduce the amount of
MyBatis
> binding
> >> > code
> >> > > > that
> >> > > > > > > needs to be maintained.
> >> > > > > > >
> >> > > > > > > There are a few downsides to switching to our
own java
> thrift
> >> > code
> >> > > > gen:
> >> > > > > > > 1. We own more code to maintain:  Even though
we have the
> >> custom
> >> > > > python
> >> > > > > > > "immutable" wrapper generator [1] today, this
new generator
> -
> >> > even
> >> > > > with
> >> > > > > > the
> >> > > > > > > python generator removed - represents a 5-6x increase
in
> line
> >> > count
> >> > > > of
> >> > > > > > > custom code (~4.1k lines of code and tests in
the new custom
> >> gen,
> >> > > > ~700
> >> > > > > > > lines in the existing python custom gen)
> >> > > > > > > 2. We conceptually fork from a sibling Apache
project.
> >> > > > > > >
> >> > > > > > > The fork could be mitigated by turning our real
experience
> >> > > iterating
> >> > > > > the
> >> > > > > > > custom code generator into a well-founded patch
back into
> the
> >> > > Apache
> >> > > > > > Thrift
> >> > > > > > > project, but saying we'll do that is easier than
following
> >> > through
> >> > > > and
> >> > > > > > > actually doing it.
> >> > > > > > >
> >> > > > > > > ==
> >> > > > > > > Review guide / details:
> >> > > > > > >
> >> > > > > > > The technology stack:
> >> > > > > > > The thrift IDL parsing and thrift wire parsing
are both
> >> handled
> >> > by
> >> > > > the
> >> > > > > > > Facebook swift project [4].  We only implement
the middle
> bit
> >> > that
> >> > > > > > > generates java code stubs.  This gives higher
confidence
> that
> >> the
> >> > > > > tricky
> >> > > > > > > bits out at either edge are done right.
> >> > > > > > > The thrift struct code generation is done using
Square's
> >> javapoet
> >> > > [5]
> >> > > > > in
> >> > > > > > > favor of templating for the purpose of easier
to read
> >> generator
> >> > > code.
> >> > > > > > This
> >> > > > > > > characterization is debatable though and template
are
> >> certainly
> >> > > more
> >> > > > > > > flexible the minute you need to gen a second language
(say
> we
> >> > like
> >> > > > this
> >> > > > > > and
> >> > > > > > > want to do javascript codegen this way too for
example).
> >> > > > > > > The MyBatis codegen is forced by the thrift codegen
for
> >> technical
> >> > > > > > > reasons.  In short, there is no simple way to
teach MyBatis
> to
> >> > read
> >> > > > and
> >> > > > > > > write immutable objects with builders.  So the
MyBatis code
> is
> >> > > > > generated
> >> > > > > > > via an annotation processor that runs after thrift
code gen,
> >> but
> >> > > > > reading
> >> > > > > > > thrift annotations that survive that codegen process.
> >> > > > > > > The codegen unit testing is done with the help
of Google's
> >> > > > > compile-tester
> >> > > > > > > [6].  NB that this has an expected output comparison
that
> >> checks
> >> > > the
> >> > > > > > > generated AST and not the text, so its fairly
lenient.
> >> > Whitepsace
> >> > > > and
> >> > > > > > > comments certainly don't matter.
> >> > > > > > >
> >> > > > > > > Review strategy:
> >> > > > > > > The code generator RBs (#1 & #2 in the 3 part
series) are
> >> > probably
> >> > > > > easier
> >> > > > > > > to review looking at samples of the generated
code.  Both
> the
> >> > > thrift
> >> > > > > > > codegen and MyBatis codegen samples are conveniently
> >> contained in
> >> > > the
> >> > > > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/
> ).
> >> > The
> >> > > > > unit
> >> > > > > > > test uses resource files that contain both the
thrift
> codegen
> >> > > inputs
> >> > > > > the
> >> > > > > > > annotation processor runs over and the annotation
processor
> >> > > expected
> >> > > > > > > outputs  - the MyBatis peer classes.  So have
a look there
> if
> >> you
> >> > > > need
> >> > > > > > > something concrete and don't want to patch the
RBs in and
> >> > actually
> >> > > > run
> >> > > > > > the
> >> > > > > > > codegen (`./gradlew api:compileJava`).
> >> > > > > > > The conversion RB (#3) is large but the changes
are mainly
> >> > > mechanical
> >> > > > > > > conversions from the current mutable thrift +
I* wrappers to
> >> pure
> >> > > > > > immutable
> >> > > > > > > thrift mutated via `.toBuilder` and `.with`'er
methods.  The
> >> main
> >> > > > > changes
> >> > > > > > > of note are to the portions of the codebase tightly
tied to
> >> > thrift
> >> > > > as a
> >> > > > > > > technology:
> >> > > > > > > + Gson/thrift converters
> >> > > > > > > + Shiro annotated auth param interception
> >> > > > > > > + Thrift/Servlet binding
> >> > > > > > >
> >> > > > > > > [1]
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> >> > > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> >> > > > > > > [3]
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> >> > > > > > > [4] https://github.com/facebook/swift
> >> > > > > > > [5] https://github.com/square/javapoet
> >> > > > > > > [6] https://github.com/google/compile-testing
> >> > > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > Zameer Manji
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>
>
> --
> John Sirois
> 303-512-3301
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message