aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sirois <j...@conductant.com>
Subject Re: [PROPOSAL] Change java thrift code gen
Date Thu, 28 Jan 2016 01:33:44 GMT
--
John Sirois
303-512-3301
On Jan 27, 2016 6:08 PM, "Maxim Khutornenko" <maxim@apache.org> wrote:
>
> I am +1 to making immutable thrift objects solely based on perf numbers.
>
> My biggest concern though is maintenance of a pretty intricate codebase,
> especially when it comes to upgrading any of the frameworks involved.
> Bill's suggestion to explore paths to make this a part of Apache Thrift
> sounds great to me.
>
> On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wfarner@apache.org> wrote:
>
> > Tony - this would not be a technical fork so much as a spiritual fork.
> > While we would have our own bugs to work out, the only upstream exposure
> > would be IDL or wire format changes.
> >
> > On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <tdong@twitter.com.invalid>
> > wrote:
> >
> > > Awesome performance numbers! I don't particularly know the logistics
of
> > > upstreaming a change like this, but optimistically I would suggest
> > > upstreaming it to Apache Thrift if possible.
> > >
> > > As someone that maintains a fork of a thrift compiler(fork of
scrooge), I
> > > have to say that it's not that fun. There's a lot of custom code that
> > needs
> > > to be maintained and a bunch of work to rebase the code periodically.
> > >
> > >
> > >
> > > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wfarner@apache.org>
wrote:
> > >
> > > > Firstly - thanks for the clean organization and delineation of
steps in
> > > > this change.  Top notch work!
> > > >
> > > > Some of the performance improvements are very nice; and in a
> > particularly
> > > > hot code path.  I will wager a guess that the majority of the
savings
> > is
> > > in
> > > > avoiding what amounts to copy constructors between mutable and
> > immutable
> > > > types.  I further wager there are alternative approaches we could
weigh
> > > to
> > > > achieve those performance improvements.  As an example - you note
above
> > > > that we could provide a patch to Apache Thrift.  Depending how much
> > > > performance inspires our decision here, it will be prudent to
evaluate
> > > > alternatives.
> > > >
> > > > I think there are (at least) two major issues worth discussing -
code
> > > > volume (which you note) and an increase in logical complexity.  This
> > will
> > > > leave us with a bifurcation in code generation tooling (custom+swift
> > for
> > > > Java, Apache Thrift for python and js).  It's difficult to quantify
the
> > > > downside of that, but it seems like an unfortunate state with
potential

One note on the bifurcation is that it would be temporary since a 1st class
JSON api (https://issues.apache.org/jira/browse/AURORA-987), means only the
scheduler will use thrift internally - as its data model source of truth
and ser/deser for the distributed log.

> > > for
> > > > compatibility risks.
> > > >
> > > >
> > > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zmanji@apache.org>
> > wrote:
> > > >
> > > > > Some high level comments without looking at the code.
> > > > >
> > > > > I'm in favor from abandoning the thrift generated java code in
favor
> > of
> > > > > immutable objects. I think it is easier to reason about and will
> > ensure
> > > > we
> > > > > have less errors in our code. If I understand correctly, the
ProtoBuf
> > > > > format does this by default, so there some precedent for this
style
> > of
> > > > code
> > > > > generation already.
> > > > >
> > > > > I think using Facebook's swift is the best approach here. I would
be
> > > > > hesitant to accept any custom code generation that involved us
> > parsing
> > > > > thrift IDL files or thrift formats over the wire because I poses
a
> > very
> > > > > high maintenance burden.
> > > > >
> > > > > I also think generating the MyBatis mutable classes is superior to
> > our
> > > > > current strategy of manually creating them.
> > > > >
> > > > > Finally, the performance improvements look fantastic. As an
operator
> > > of a
> > > > > large cluster I am excited to see wholesale performance
improvements
> > > as I
> > > > > am always concerned that my cluster is approaching the limits of
what
> > > > > Aurora can handle safely.
> > > > >
> > > > > Overall, I think this change merits a serious discussion from all
> > > > > contributors.
> > > > >
> > > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <jsirois@apache.org>
> > > wrote:
> > > > >
> > > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <jsirois@apache.org
>
> > > > wrote:
> > > > > >
> > > > > > > Context: Aurora uses the official Apache Thrift compiler
today
> > > plus a
> > > > > > > home-grown python code generator [1] for immutable "entity"
(I*)
> > > > > > wrappers.
> > > > > > >
> > > > > > > The proposal is to switch from using the Apache Thrift
code
> > > generator
> > > > > to
> > > > > > a
> > > > > > > home grown generator.  The proposal comes with a concrete
example
> > > in
> > > > > the
> > > > > > > form of the actual RBs to effect this change:
> > > > > > > 1. A custom java thrift code generator:
> > > > > > > https://reviews.apache.org/r/42748/
> > > > > > > 2. A custom MyBatis binding code generator powered by 1
above:
> > > > > > > https://reviews.apache.org/r/42749/
> > > > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
> > > > > > > https://reviews.apache.org/r/42756/
> > > > > > >
> > > > > > > Since the RBs are large, I wanted to provide some extra
context
> > on
> > > > the
> > > > > > > idea at a higher level.  I provide rationale, pros and
cons
below
> > > for
> > > > > > those
> > > > > > > interested in the idea but wary of diving in on code review
until
> > > the
> > > > > > idea
> > > > > > > itself passes a sniff test.
> > > > > > > Thanks in advance for your feedback - and if we get there
-
for
> > > your
> > > > > > > review effort.
> > > > > > >
> > > > > >
> > > > > > I just added wfarner and zmanji as reviewers for the 3 RBs above
> > > since
> > > > > > they've expressed direct interest.  Happy to add others, just
speak
> > > up
> > > > or
> > > > > > else just comment on the reviews as you see fit.
> > > > > > I'll formally only submit if 1st this email thread reaches
> > consensus,
> > > > and
> > > > > > second, reviews are approved.
> > > > > >
> > > > > > ==
> > > > > > >
> > > > > > > In the course of an initial run at creating a first-class
> > REST-like
> > > > > > > scheduler interface [2] I came to the conclusion generating
the
> > > json
> > > > > API
> > > > > > > from the thrift one might be a good path.  That idea has
been
> > > > scrapped
> > > > > > with
> > > > > > > community feedback, but an initial experiment in custom
thrift
> > > > code-gen
> > > > > > for
> > > > > > > java that accompanied that idea seemed worth pursuing for
its
own
> > > > > > > independent benefits, chief among these being 1st class
immutable
> > > > > thrift
> > > > > > > structs and the ability to leverage thrift annotations.
> > > > > > >
> > > > > > > Immutability:
> > > > > > > The benefits of having an immutable by default data model
are
the
> > > > > > standard
> > > > > > > ones; namely, its trivial to reason about safety of concurrent
> > > > > operations
> > > > > > > on the data model, stability of collections containing
data
model
> > > > > > entities
> > > > > > > and it opens up straight-forward optimizations that are
easy
to
> > > > reason
> > > > > > > about.
> > > > > > > An example optimization is caching hashCodes for the immutable
> > > thrift
> > > > > > > structs.  This was done after comparing jmh benchmarks
run
> > against
> > > > > master
> > > > > > > and then again against the proposal branch.  Perf was
comparable
> > -
> > > > > within
> > > > > > > 10% plus and minus depending on the benchmark, but with
the
> > > > > optimization
> > > > > > > added many benchmarks showed pronounced improvement in
the
> > proposal
> > > > > > branch
> > > > > > > [3].  The optimization is clearly safe and was quick and
easy
to
> > > > > > > implement.  Further optimizations can be experimented with
in
a
> > > > > > > straightforward way.
> > > > > > >
> > > > > > > Thrift Annotations:
> > > > > > > The thrift IDL grammar has supported these for quite some
time,
> > but
> > > > > they
> > > > > > > are not plumbed to the generated java code.  Uses are many
and
> > > > > varied.  I
> > > > > > > initially had my eye on annotation of thrift services with
REST
> > > > verbs,
> > > > > > > routes, etc - but immediately we can leverage these
annotations
> > to
> > > > kill
> > > > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding
> > code
> > > > that
> > > > > > > needs to be maintained.
> > > > > > >
> > > > > > > There are a few downsides to switching to our own java
thrift
> > code
> > > > gen:
> > > > > > > 1. We own more code to maintain:  Even though we have the
custom
> > > > python
> > > > > > > "immutable" wrapper generator [1] today, this new generator
-
> > even
> > > > with
> > > > > > the
> > > > > > > python generator removed - represents a 5-6x increase in
line
> > count
> > > > of
> > > > > > > custom code (~4.1k lines of code and tests in the new custom
gen,
> > > > ~700
> > > > > > > lines in the existing python custom gen)
> > > > > > > 2. We conceptually fork from a sibling Apache project.
> > > > > > >
> > > > > > > The fork could be mitigated by turning our real experience
> > > iterating
> > > > > the
> > > > > > > custom code generator into a well-founded patch back into
the
> > > Apache
> > > > > > Thrift
> > > > > > > project, but saying we'll do that is easier than following
> > through
> > > > and
> > > > > > > actually doing it.
> > > > > > >
> > > > > > > ==
> > > > > > > Review guide / details:
> > > > > > >
> > > > > > > The technology stack:
> > > > > > > The thrift IDL parsing and thrift wire parsing are both
handled
> > by
> > > > the
> > > > > > > Facebook swift project [4].  We only implement the middle
bit
> > that
> > > > > > > generates java code stubs.  This gives higher confidence
that
the
> > > > > tricky
> > > > > > > bits out at either edge are done right.
> > > > > > > The thrift struct code generation is done using Square's
javapoet
> > > [5]
> > > > > in
> > > > > > > favor of templating for the purpose of easier to read
generator
> > > code.
> > > > > > This
> > > > > > > characterization is debatable though and template are
certainly
> > > more
> > > > > > > flexible the minute you need to gen a second language (say
we
> > like
> > > > this
> > > > > > and
> > > > > > > want to do javascript codegen this way too for example).
> > > > > > > The MyBatis codegen is forced by the thrift codegen for
technical
> > > > > > > reasons.  In short, there is no simple way to teach MyBatis
to
> > read
> > > > and
> > > > > > > write immutable objects with builders.  So the MyBatis
code is
> > > > > generated
> > > > > > > via an annotation processor that runs after thrift code
gen,
but
> > > > > reading
> > > > > > > thrift annotations that survive that codegen process.
> > > > > > > The codegen unit testing is done with the help of Google's
> > > > > compile-tester
> > > > > > > [6].  NB that this has an expected output comparison that
checks
> > > the
> > > > > > > generated AST and not the text, so its fairly lenient.
> > Whitepsace
> > > > and
> > > > > > > comments certainly don't matter.
> > > > > > >
> > > > > > > Review strategy:
> > > > > > > The code generator RBs (#1 & #2 in the 3 part series)
are
> > probably
> > > > > easier
> > > > > > > to review looking at samples of the generated code.  Both
the
> > > thrift
> > > > > > > codegen and MyBatis codegen samples are conveniently
contained in
> > > the
> > > > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).
> > The
> > > > > unit
> > > > > > > test uses resource files that contain both the thrift codegen
> > > inputs
> > > > > the
> > > > > > > annotation processor runs over and the annotation processor
> > > expected
> > > > > > > outputs  - the MyBatis peer classes.  So have a look there
if
you
> > > > need
> > > > > > > something concrete and don't want to patch the RBs in and
> > actually
> > > > run
> > > > > > the
> > > > > > > codegen (`./gradlew api:compileJava`).
> > > > > > > The conversion RB (#3) is large but the changes are mainly
> > > mechanical
> > > > > > > conversions from the current mutable thrift + I* wrappers
to
pure
> > > > > > immutable
> > > > > > > thrift mutated via `.toBuilder` and `.with`'er methods.
 The
main
> > > > > changes
> > > > > > > of note are to the portions of the codebase tightly tied
to
> > thrift
> > > > as a
> > > > > > > technology:
> > > > > > > + Gson/thrift converters
> > > > > > > + Shiro annotated auth param interception
> > > > > > > + Thrift/Servlet binding
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
> > > > > > > [3]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > > > > > > [4] https://github.com/facebook/swift
> > > > > > > [5] https://github.com/square/javapoet
> > > > > > > [6] https://github.com/google/compile-testing
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Zameer Manji
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message