aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sirois <jsir...@apache.org>
Subject Re: [PROPOSAL] Change java thrift code gen
Date Tue, 02 Feb 2016 14:22:28 GMT
To wrap things up - I'm considering this proposal as having failed; so I've
marked the 3 associated RBs as discarded.

I'll be working on https://issues.apache.org/jira/browse/THRIFT-3583 to
introduce an immutable builder-style of java codegen that carries over
thrift annotations to Apache Thrift.  Once that is complete and released,
I'll circle back and re-do an RB like 3/3 (
https://reviews.apache.org/r/42756/) that flips over the codebase to the
new apache thrift gen.

On Thu, Jan 28, 2016 at 10:08 AM, John Sirois <jsirois@apache.org> wrote:

>
>
> On Thu, Jan 28, 2016 at 8:26 AM, John Sirois <john@conductant.com> wrote:
>
>> On Thu, Jan 28, 2016 at 6:45 AM, Jake Farrell <jfarrell@apache.org>
>> wrote:
>>
>> > +1 to making this apart of Thrift, i'm happy to help shepard this on the
>> > Thrift side and get it in as soon as its ready
>> >
>>
>> I've filed https://issues.apache.org/jira/browse/THRIFT-3583 to use as
>> the
>> basis for discussion of this feature over in the Apache Thrift project.
>> I looked at the problem a bit and noted some challenges.
>>
>
> And started a dev@thrift.apache.org thread here if anyone wants to follow
> along or participate: http://markmail.org/message/mlxyyauyvlvuxjsf
>
>
>>
>>
>> >
>> > -Jake
>> >
>> > On Wed, Jan 27, 2016 at 8:08 PM, Maxim Khutornenko <maxim@apache.org>
>> > wrote:
>> >
>> >> I am +1 to making immutable thrift objects solely based on perf
>> numbers.
>> >>
>> >> My biggest concern though is maintenance of a pretty intricate
>> codebase,
>> >> especially when it comes to upgrading any of the frameworks involved.
>> >> Bill's suggestion to explore paths to make this a part of Apache Thrift
>> >> sounds great to me.
>> >>
>> >> On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wfarner@apache.org>
>> wrote:
>> >>
>> >> > Tony - this would not be a technical fork so much as a spiritual
>> fork.
>> >> > While we would have our own bugs to work out, the only upstream
>> exposure
>> >> > would be IDL or wire format changes.
>> >> >
>> >> > On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <tdong@twitter.com.invalid
>> >
>> >> > wrote:
>> >> >
>> >> > > Awesome performance numbers! I don't particularly know the
>> logistics
>> >> of
>> >> > > upstreaming a change like this, but optimistically I would suggest
>> >> > > upstreaming it to Apache Thrift if possible.
>> >> > >
>> >> > > As someone that maintains a fork of a thrift compiler(fork of
>> >> scrooge), I
>> >> > > have to say that it's not that fun. There's a lot of custom code
>> that
>> >> > needs
>> >> > > to be maintained and a bunch of work to rebase the code
>> periodically.
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wfarner@apache.org>
>> >> wrote:
>> >> > >
>> >> > > > Firstly - thanks for the clean organization and delineation
of
>> >> steps in
>> >> > > > this change.  Top notch work!
>> >> > > >
>> >> > > > Some of the performance improvements are very nice; and in
a
>> >> > particularly
>> >> > > > hot code path.  I will wager a guess that the majority of
the
>> >> savings
>> >> > is
>> >> > > in
>> >> > > > avoiding what amounts to copy constructors between mutable
and
>> >> > immutable
>> >> > > > types.  I further wager there are alternative approaches
we could
>> >> weigh
>> >> > > to
>> >> > > > achieve those performance improvements.  As an example -
you note
>> >> above
>> >> > > > that we could provide a patch to Apache Thrift.  Depending
how
>> much
>> >> > > > performance inspires our decision here, it will be prudent
to
>> >> evaluate
>> >> > > > alternatives.
>> >> > > >
>> >> > > > I think there are (at least) two major issues worth discussing
-
>> >> code
>> >> > > > volume (which you note) and an increase in logical complexity.
>> This
>> >> > will
>> >> > > > leave us with a bifurcation in code generation tooling
>> (custom+swift
>> >> > for
>> >> > > > Java, Apache Thrift for python and js).  It's difficult to
>> quantify
>> >> the
>> >> > > > downside of that, but it seems like an unfortunate state
with
>> >> potential
>> >> > > for
>> >> > > > compatibility risks.
>> >> > > >
>> >> > > >
>> >> > > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zmanji@apache.org
>> >
>> >> > wrote:
>> >> > > >
>> >> > > > > Some high level comments without looking at the code.
>> >> > > > >
>> >> > > > > I'm in favor from abandoning the thrift generated java
code in
>> >> favor
>> >> > of
>> >> > > > > immutable objects. I think it is easier to reason about
and
>> will
>> >> > ensure
>> >> > > > we
>> >> > > > > have less errors in our code. If I understand correctly,
the
>> >> ProtoBuf
>> >> > > > > format does this by default, so there some precedent
for this
>> >> style
>> >> > of
>> >> > > > code
>> >> > > > > generation already.
>> >> > > > >
>> >> > > > > I think using Facebook's swift is the best approach
here. I
>> would
>> >> be
>> >> > > > > hesitant to accept any custom code generation that involved
us
>> >> > parsing
>> >> > > > > thrift IDL files or thrift formats over the wire because
I
>> poses a
>> >> > very
>> >> > > > > high maintenance burden.
>> >> > > > >
>> >> > > > > I also think generating the MyBatis mutable classes
is
>> superior to
>> >> > our
>> >> > > > > current strategy of manually creating them.
>> >> > > > >
>> >> > > > > Finally, the performance improvements look fantastic.
As an
>> >> operator
>> >> > > of a
>> >> > > > > large cluster I am excited to see wholesale performance
>> >> improvements
>> >> > > as I
>> >> > > > > am always concerned that my cluster is approaching the
limits
>> of
>> >> what
>> >> > > > > Aurora can handle safely.
>> >> > > > >
>> >> > > > > Overall, I think this change merits a serious discussion
from
>> all
>> >> > > > > contributors.
>> >> > > > >
>> >> > > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <
>> jsirois@apache.org>
>> >> > > wrote:
>> >> > > > >
>> >> > > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <
>> >> jsirois@apache.org>
>> >> > > > wrote:
>> >> > > > > >
>> >> > > > > > > Context: Aurora uses the official Apache Thrift
compiler
>> today
>> >> > > plus a
>> >> > > > > > > home-grown python code generator [1] for immutable
"entity"
>> >> (I*)
>> >> > > > > > wrappers.
>> >> > > > > > >
>> >> > > > > > > The proposal is to switch from using the Apache
Thrift code
>> >> > > generator
>> >> > > > > to
>> >> > > > > > a
>> >> > > > > > > home grown generator.  The proposal comes
with a concrete
>> >> example
>> >> > > in
>> >> > > > > the
>> >> > > > > > > form of the actual RBs to effect this change:
>> >> > > > > > > 1. A custom java thrift code generator:
>> >> > > > > > > https://reviews.apache.org/r/42748/
>> >> > > > > > > 2. A custom MyBatis binding code generator
powered by 1
>> above:
>> >> > > > > > > https://reviews.apache.org/r/42749/
>> >> > > > > > > 3. Integration of 1 & 2 above into the
Aurora codebase:
>> >> > > > > > > https://reviews.apache.org/r/42756/
>> >> > > > > > >
>> >> > > > > > > Since the RBs are large, I wanted to provide
some extra
>> >> context
>> >> > on
>> >> > > > the
>> >> > > > > > > idea at a higher level.  I provide rationale,
pros and cons
>> >> below
>> >> > > for
>> >> > > > > > those
>> >> > > > > > > interested in the idea but wary of diving
in on code review
>> >> until
>> >> > > the
>> >> > > > > > idea
>> >> > > > > > > itself passes a sniff test.
>> >> > > > > > > Thanks in advance for your feedback - and
if we get there -
>> >> for
>> >> > > your
>> >> > > > > > > review effort.
>> >> > > > > > >
>> >> > > > > >
>> >> > > > > > I just added wfarner and zmanji as reviewers for
the 3 RBs
>> above
>> >> > > since
>> >> > > > > > they've expressed direct interest.  Happy to add
others, just
>> >> speak
>> >> > > up
>> >> > > > or
>> >> > > > > > else just comment on the reviews as you see fit.
>> >> > > > > > I'll formally only submit if 1st this email thread
reaches
>> >> > consensus,
>> >> > > > and
>> >> > > > > > second, reviews are approved.
>> >> > > > > >
>> >> > > > > > ==
>> >> > > > > > >
>> >> > > > > > > In the course of an initial run at creating
a first-class
>> >> > REST-like
>> >> > > > > > > scheduler interface [2] I came to the conclusion
generating
>> >> the
>> >> > > json
>> >> > > > > API
>> >> > > > > > > from the thrift one might be a good path.
 That idea has
>> been
>> >> > > > scrapped
>> >> > > > > > with
>> >> > > > > > > community feedback, but an initial experiment
in custom
>> thrift
>> >> > > > code-gen
>> >> > > > > > for
>> >> > > > > > > java that accompanied that idea seemed worth
pursuing for
>> its
>> >> own
>> >> > > > > > > independent benefits, chief among these being
1st class
>> >> immutable
>> >> > > > > thrift
>> >> > > > > > > structs and the ability to leverage thrift
annotations.
>> >> > > > > > >
>> >> > > > > > > Immutability:
>> >> > > > > > > The benefits of having an immutable by default
data model
>> are
>> >> the
>> >> > > > > > standard
>> >> > > > > > > ones; namely, its trivial to reason about
safety of
>> concurrent
>> >> > > > > operations
>> >> > > > > > > on the data model, stability of collections
containing data
>> >> model
>> >> > > > > > entities
>> >> > > > > > > and it opens up straight-forward optimizations
that are
>> easy
>> >> to
>> >> > > > reason
>> >> > > > > > > about.
>> >> > > > > > > An example optimization is caching hashCodes
for the
>> immutable
>> >> > > thrift
>> >> > > > > > > structs.  This was done after comparing jmh
benchmarks run
>> >> > against
>> >> > > > > master
>> >> > > > > > > and then again against the proposal branch.
 Perf was
>> >> comparable
>> >> > -
>> >> > > > > within
>> >> > > > > > > 10% plus and minus depending on the benchmark,
but with the
>> >> > > > > optimization
>> >> > > > > > > added many benchmarks showed pronounced improvement
in the
>> >> > proposal
>> >> > > > > > branch
>> >> > > > > > > [3].  The optimization is clearly safe and
was quick and
>> easy
>> >> to
>> >> > > > > > > implement.  Further optimizations can be experimented
with
>> in
>> >> a
>> >> > > > > > > straightforward way.
>> >> > > > > > >
>> >> > > > > > > Thrift Annotations:
>> >> > > > > > > The thrift IDL grammar has supported these
for quite some
>> >> time,
>> >> > but
>> >> > > > > they
>> >> > > > > > > are not plumbed to the generated java code.
 Uses are many
>> and
>> >> > > > > varied.  I
>> >> > > > > > > initially had my eye on annotation of thrift
services with
>> >> REST
>> >> > > > verbs,
>> >> > > > > > > routes, etc - but immediately we can leverage
these
>> >> annotations
>> >> > to
>> >> > > > kill
>> >> > > > > > > AnnotatedAuroraAdmin and reduce the amount
of MyBatis
>> binding
>> >> > code
>> >> > > > that
>> >> > > > > > > needs to be maintained.
>> >> > > > > > >
>> >> > > > > > > There are a few downsides to switching to
our own java
>> thrift
>> >> > code
>> >> > > > gen:
>> >> > > > > > > 1. We own more code to maintain:  Even though
we have the
>> >> custom
>> >> > > > python
>> >> > > > > > > "immutable" wrapper generator [1] today, this
new
>> generator -
>> >> > even
>> >> > > > with
>> >> > > > > > the
>> >> > > > > > > python generator removed - represents a 5-6x
increase in
>> line
>> >> > count
>> >> > > > of
>> >> > > > > > > custom code (~4.1k lines of code and tests
in the new
>> custom
>> >> gen,
>> >> > > > ~700
>> >> > > > > > > lines in the existing python custom gen)
>> >> > > > > > > 2. We conceptually fork from a sibling Apache
project.
>> >> > > > > > >
>> >> > > > > > > The fork could be mitigated by turning our
real experience
>> >> > > iterating
>> >> > > > > the
>> >> > > > > > > custom code generator into a well-founded
patch back into
>> the
>> >> > > Apache
>> >> > > > > > Thrift
>> >> > > > > > > project, but saying we'll do that is easier
than following
>> >> > through
>> >> > > > and
>> >> > > > > > > actually doing it.
>> >> > > > > > >
>> >> > > > > > > ==
>> >> > > > > > > Review guide / details:
>> >> > > > > > >
>> >> > > > > > > The technology stack:
>> >> > > > > > > The thrift IDL parsing and thrift wire parsing
are both
>> >> handled
>> >> > by
>> >> > > > the
>> >> > > > > > > Facebook swift project [4].  We only implement
the middle
>> bit
>> >> > that
>> >> > > > > > > generates java code stubs.  This gives higher
confidence
>> that
>> >> the
>> >> > > > > tricky
>> >> > > > > > > bits out at either edge are done right.
>> >> > > > > > > The thrift struct code generation is done
using Square's
>> >> javapoet
>> >> > > [5]
>> >> > > > > in
>> >> > > > > > > favor of templating for the purpose of easier
to read
>> >> generator
>> >> > > code.
>> >> > > > > > This
>> >> > > > > > > characterization is debatable though and template
are
>> >> certainly
>> >> > > more
>> >> > > > > > > flexible the minute you need to gen a second
language (say
>> we
>> >> > like
>> >> > > > this
>> >> > > > > > and
>> >> > > > > > > want to do javascript codegen this way too
for example).
>> >> > > > > > > The MyBatis codegen is forced by the thrift
codegen for
>> >> technical
>> >> > > > > > > reasons.  In short, there is no simple way
to teach
>> MyBatis to
>> >> > read
>> >> > > > and
>> >> > > > > > > write immutable objects with builders.  So
the MyBatis
>> code is
>> >> > > > > generated
>> >> > > > > > > via an annotation processor that runs after
thrift code
>> gen,
>> >> but
>> >> > > > > reading
>> >> > > > > > > thrift annotations that survive that codegen
process.
>> >> > > > > > > The codegen unit testing is done with the
help of Google's
>> >> > > > > compile-tester
>> >> > > > > > > [6].  NB that this has an expected output
comparison that
>> >> checks
>> >> > > the
>> >> > > > > > > generated AST and not the text, so its fairly
lenient.
>> >> > Whitepsace
>> >> > > > and
>> >> > > > > > > comments certainly don't matter.
>> >> > > > > > >
>> >> > > > > > > Review strategy:
>> >> > > > > > > The code generator RBs (#1 & #2 in the
3 part series) are
>> >> > probably
>> >> > > > > easier
>> >> > > > > > > to review looking at samples of the generated
code.  Both
>> the
>> >> > > thrift
>> >> > > > > > > codegen and MyBatis codegen samples are conveniently
>> >> contained in
>> >> > > the
>> >> > > > > > > MyBatis codegen RB (#2:
>> https://reviews.apache.org/r/42749/).
>> >> > The
>> >> > > > > unit
>> >> > > > > > > test uses resource files that contain both
the thrift
>> codegen
>> >> > > inputs
>> >> > > > > the
>> >> > > > > > > annotation processor runs over and the annotation
processor
>> >> > > expected
>> >> > > > > > > outputs  - the MyBatis peer classes.  So have
a look there
>> if
>> >> you
>> >> > > > need
>> >> > > > > > > something concrete and don't want to patch
the RBs in and
>> >> > actually
>> >> > > > run
>> >> > > > > > the
>> >> > > > > > > codegen (`./gradlew api:compileJava`).
>> >> > > > > > > The conversion RB (#3) is large but the changes
are mainly
>> >> > > mechanical
>> >> > > > > > > conversions from the current mutable thrift
+ I* wrappers
>> to
>> >> pure
>> >> > > > > > immutable
>> >> > > > > > > thrift mutated via `.toBuilder` and `.with`'er
methods.
>> The
>> >> main
>> >> > > > > changes
>> >> > > > > > > of note are to the portions of the codebase
tightly tied to
>> >> > thrift
>> >> > > > as a
>> >> > > > > > > technology:
>> >> > > > > > > + Gson/thrift converters
>> >> > > > > > > + Shiro annotated auth param interception
>> >> > > > > > > + Thrift/Servlet binding
>> >> > > > > > >
>> >> > > > > > > [1]
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
>> >> > > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
>> >> > > > > > > [3]
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
>> >> > > > > > > [4] https://github.com/facebook/swift
>> >> > > > > > > [5] https://github.com/square/javapoet
>> >> > > > > > > [6] https://github.com/google/compile-testing
>> >> > > > > > >
>> >> > > > > >
>> >> > > > > > --
>> >> > > > > > Zameer Manji
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>>
>> --
>> John Sirois
>> 303-512-3301
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message