aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sirois <j...@conductant.com>
Subject Re: [PROPOSAL] Change java thrift code gen
Date Thu, 28 Jan 2016 15:26:32 GMT
On Thu, Jan 28, 2016 at 6:45 AM, Jake Farrell <jfarrell@apache.org> wrote:

> +1 to making this apart of Thrift, i'm happy to help shepard this on the
> Thrift side and get it in as soon as its ready
>

I've filed https://issues.apache.org/jira/browse/THRIFT-3583 to use as the
basis for discussion of this feature over in the Apache Thrift project.
I looked at the problem a bit and noted some challenges.


>
> -Jake
>
> On Wed, Jan 27, 2016 at 8:08 PM, Maxim Khutornenko <maxim@apache.org>
> wrote:
>
>> I am +1 to making immutable thrift objects solely based on perf numbers.
>>
>> My biggest concern though is maintenance of a pretty intricate codebase,
>> especially when it comes to upgrading any of the frameworks involved.
>> Bill's suggestion to explore paths to make this a part of Apache Thrift
>> sounds great to me.
>>
>> On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner <wfarner@apache.org> wrote:
>>
>> > Tony - this would not be a technical fork so much as a spiritual fork.
>> > While we would have our own bugs to work out, the only upstream exposure
>> > would be IDL or wire format changes.
>> >
>> > On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong <tdong@twitter.com.invalid>
>> > wrote:
>> >
>> > > Awesome performance numbers! I don't particularly know the logistics
>> of
>> > > upstreaming a change like this, but optimistically I would suggest
>> > > upstreaming it to Apache Thrift if possible.
>> > >
>> > > As someone that maintains a fork of a thrift compiler(fork of
>> scrooge), I
>> > > have to say that it's not that fun. There's a lot of custom code that
>> > needs
>> > > to be maintained and a bunch of work to rebase the code periodically.
>> > >
>> > >
>> > >
>> > > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner <wfarner@apache.org>
>> wrote:
>> > >
>> > > > Firstly - thanks for the clean organization and delineation of
>> steps in
>> > > > this change.  Top notch work!
>> > > >
>> > > > Some of the performance improvements are very nice; and in a
>> > particularly
>> > > > hot code path.  I will wager a guess that the majority of the
>> savings
>> > is
>> > > in
>> > > > avoiding what amounts to copy constructors between mutable and
>> > immutable
>> > > > types.  I further wager there are alternative approaches we could
>> weigh
>> > > to
>> > > > achieve those performance improvements.  As an example - you note
>> above
>> > > > that we could provide a patch to Apache Thrift.  Depending how much
>> > > > performance inspires our decision here, it will be prudent to
>> evaluate
>> > > > alternatives.
>> > > >
>> > > > I think there are (at least) two major issues worth discussing -
>> code
>> > > > volume (which you note) and an increase in logical complexity.  This
>> > will
>> > > > leave us with a bifurcation in code generation tooling (custom+swift
>> > for
>> > > > Java, Apache Thrift for python and js).  It's difficult to quantify
>> the
>> > > > downside of that, but it seems like an unfortunate state with
>> potential
>> > > for
>> > > > compatibility risks.
>> > > >
>> > > >
>> > > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji <zmanji@apache.org>
>> > wrote:
>> > > >
>> > > > > Some high level comments without looking at the code.
>> > > > >
>> > > > > I'm in favor from abandoning the thrift generated java code in
>> favor
>> > of
>> > > > > immutable objects. I think it is easier to reason about and will
>> > ensure
>> > > > we
>> > > > > have less errors in our code. If I understand correctly, the
>> ProtoBuf
>> > > > > format does this by default, so there some precedent for this
>> style
>> > of
>> > > > code
>> > > > > generation already.
>> > > > >
>> > > > > I think using Facebook's swift is the best approach here. I would
>> be
>> > > > > hesitant to accept any custom code generation that involved us
>> > parsing
>> > > > > thrift IDL files or thrift formats over the wire because I poses
a
>> > very
>> > > > > high maintenance burden.
>> > > > >
>> > > > > I also think generating the MyBatis mutable classes is superior
to
>> > our
>> > > > > current strategy of manually creating them.
>> > > > >
>> > > > > Finally, the performance improvements look fantastic. As an
>> operator
>> > > of a
>> > > > > large cluster I am excited to see wholesale performance
>> improvements
>> > > as I
>> > > > > am always concerned that my cluster is approaching the limits
of
>> what
>> > > > > Aurora can handle safely.
>> > > > >
>> > > > > Overall, I think this change merits a serious discussion from
all
>> > > > > contributors.
>> > > > >
>> > > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <jsirois@apache.org>
>> > > wrote:
>> > > > >
>> > > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <
>> jsirois@apache.org>
>> > > > wrote:
>> > > > > >
>> > > > > > > Context: Aurora uses the official Apache Thrift compiler
today
>> > > plus a
>> > > > > > > home-grown python code generator [1] for immutable
"entity"
>> (I*)
>> > > > > > wrappers.
>> > > > > > >
>> > > > > > > The proposal is to switch from using the Apache Thrift
code
>> > > generator
>> > > > > to
>> > > > > > a
>> > > > > > > home grown generator.  The proposal comes with a concrete
>> example
>> > > in
>> > > > > the
>> > > > > > > form of the actual RBs to effect this change:
>> > > > > > > 1. A custom java thrift code generator:
>> > > > > > > https://reviews.apache.org/r/42748/
>> > > > > > > 2. A custom MyBatis binding code generator powered
by 1 above:
>> > > > > > > https://reviews.apache.org/r/42749/
>> > > > > > > 3. Integration of 1 & 2 above into the Aurora codebase:
>> > > > > > > https://reviews.apache.org/r/42756/
>> > > > > > >
>> > > > > > > Since the RBs are large, I wanted to provide some extra
>> context
>> > on
>> > > > the
>> > > > > > > idea at a higher level.  I provide rationale, pros
and cons
>> below
>> > > for
>> > > > > > those
>> > > > > > > interested in the idea but wary of diving in on code
review
>> until
>> > > the
>> > > > > > idea
>> > > > > > > itself passes a sniff test.
>> > > > > > > Thanks in advance for your feedback - and if we get
there -
>> for
>> > > your
>> > > > > > > review effort.
>> > > > > > >
>> > > > > >
>> > > > > > I just added wfarner and zmanji as reviewers for the 3 RBs
above
>> > > since
>> > > > > > they've expressed direct interest.  Happy to add others,
just
>> speak
>> > > up
>> > > > or
>> > > > > > else just comment on the reviews as you see fit.
>> > > > > > I'll formally only submit if 1st this email thread reaches
>> > consensus,
>> > > > and
>> > > > > > second, reviews are approved.
>> > > > > >
>> > > > > > ==
>> > > > > > >
>> > > > > > > In the course of an initial run at creating a first-class
>> > REST-like
>> > > > > > > scheduler interface [2] I came to the conclusion generating
>> the
>> > > json
>> > > > > API
>> > > > > > > from the thrift one might be a good path.  That idea
has been
>> > > > scrapped
>> > > > > > with
>> > > > > > > community feedback, but an initial experiment in custom
thrift
>> > > > code-gen
>> > > > > > for
>> > > > > > > java that accompanied that idea seemed worth pursuing
for its
>> own
>> > > > > > > independent benefits, chief among these being 1st class
>> immutable
>> > > > > thrift
>> > > > > > > structs and the ability to leverage thrift annotations.
>> > > > > > >
>> > > > > > > Immutability:
>> > > > > > > The benefits of having an immutable by default data
model are
>> the
>> > > > > > standard
>> > > > > > > ones; namely, its trivial to reason about safety of
concurrent
>> > > > > operations
>> > > > > > > on the data model, stability of collections containing
data
>> model
>> > > > > > entities
>> > > > > > > and it opens up straight-forward optimizations that
are easy
>> to
>> > > > reason
>> > > > > > > about.
>> > > > > > > An example optimization is caching hashCodes for the
immutable
>> > > thrift
>> > > > > > > structs.  This was done after comparing jmh benchmarks
run
>> > against
>> > > > > master
>> > > > > > > and then again against the proposal branch.  Perf was
>> comparable
>> > -
>> > > > > within
>> > > > > > > 10% plus and minus depending on the benchmark, but
with the
>> > > > > optimization
>> > > > > > > added many benchmarks showed pronounced improvement
in the
>> > proposal
>> > > > > > branch
>> > > > > > > [3].  The optimization is clearly safe and was quick
and easy
>> to
>> > > > > > > implement.  Further optimizations can be experimented
with in
>> a
>> > > > > > > straightforward way.
>> > > > > > >
>> > > > > > > Thrift Annotations:
>> > > > > > > The thrift IDL grammar has supported these for quite
some
>> time,
>> > but
>> > > > > they
>> > > > > > > are not plumbed to the generated java code.  Uses are
many and
>> > > > > varied.  I
>> > > > > > > initially had my eye on annotation of thrift services
with
>> REST
>> > > > verbs,
>> > > > > > > routes, etc - but immediately we can leverage these
>> annotations
>> > to
>> > > > kill
>> > > > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis
binding
>> > code
>> > > > that
>> > > > > > > needs to be maintained.
>> > > > > > >
>> > > > > > > There are a few downsides to switching to our own java
thrift
>> > code
>> > > > gen:
>> > > > > > > 1. We own more code to maintain:  Even though we have
the
>> custom
>> > > > python
>> > > > > > > "immutable" wrapper generator [1] today, this new generator
-
>> > even
>> > > > with
>> > > > > > the
>> > > > > > > python generator removed - represents a 5-6x increase
in line
>> > count
>> > > > of
>> > > > > > > custom code (~4.1k lines of code and tests in the new
custom
>> gen,
>> > > > ~700
>> > > > > > > lines in the existing python custom gen)
>> > > > > > > 2. We conceptually fork from a sibling Apache project.
>> > > > > > >
>> > > > > > > The fork could be mitigated by turning our real experience
>> > > iterating
>> > > > > the
>> > > > > > > custom code generator into a well-founded patch back
into the
>> > > Apache
>> > > > > > Thrift
>> > > > > > > project, but saying we'll do that is easier than following
>> > through
>> > > > and
>> > > > > > > actually doing it.
>> > > > > > >
>> > > > > > > ==
>> > > > > > > Review guide / details:
>> > > > > > >
>> > > > > > > The technology stack:
>> > > > > > > The thrift IDL parsing and thrift wire parsing are
both
>> handled
>> > by
>> > > > the
>> > > > > > > Facebook swift project [4].  We only implement the
middle bit
>> > that
>> > > > > > > generates java code stubs.  This gives higher confidence
that
>> the
>> > > > > tricky
>> > > > > > > bits out at either edge are done right.
>> > > > > > > The thrift struct code generation is done using Square's
>> javapoet
>> > > [5]
>> > > > > in
>> > > > > > > favor of templating for the purpose of easier to read
>> generator
>> > > code.
>> > > > > > This
>> > > > > > > characterization is debatable though and template are
>> certainly
>> > > more
>> > > > > > > flexible the minute you need to gen a second language
(say we
>> > like
>> > > > this
>> > > > > > and
>> > > > > > > want to do javascript codegen this way too for example).
>> > > > > > > The MyBatis codegen is forced by the thrift codegen
for
>> technical
>> > > > > > > reasons.  In short, there is no simple way to teach
MyBatis to
>> > read
>> > > > and
>> > > > > > > write immutable objects with builders.  So the MyBatis
code is
>> > > > > generated
>> > > > > > > via an annotation processor that runs after thrift
code gen,
>> but
>> > > > > reading
>> > > > > > > thrift annotations that survive that codegen process.
>> > > > > > > The codegen unit testing is done with the help of Google's
>> > > > > compile-tester
>> > > > > > > [6].  NB that this has an expected output comparison
that
>> checks
>> > > the
>> > > > > > > generated AST and not the text, so its fairly lenient.
>> > Whitepsace
>> > > > and
>> > > > > > > comments certainly don't matter.
>> > > > > > >
>> > > > > > > Review strategy:
>> > > > > > > The code generator RBs (#1 & #2 in the 3 part series)
are
>> > probably
>> > > > > easier
>> > > > > > > to review looking at samples of the generated code.
 Both the
>> > > thrift
>> > > > > > > codegen and MyBatis codegen samples are conveniently
>> contained in
>> > > the
>> > > > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).
>> > The
>> > > > > unit
>> > > > > > > test uses resource files that contain both the thrift
codegen
>> > > inputs
>> > > > > the
>> > > > > > > annotation processor runs over and the annotation processor
>> > > expected
>> > > > > > > outputs  - the MyBatis peer classes.  So have a look
there if
>> you
>> > > > need
>> > > > > > > something concrete and don't want to patch the RBs
in and
>> > actually
>> > > > run
>> > > > > > the
>> > > > > > > codegen (`./gradlew api:compileJava`).
>> > > > > > > The conversion RB (#3) is large but the changes are
mainly
>> > > mechanical
>> > > > > > > conversions from the current mutable thrift + I* wrappers
to
>> pure
>> > > > > > immutable
>> > > > > > > thrift mutated via `.toBuilder` and `.with`'er methods.
 The
>> main
>> > > > > changes
>> > > > > > > of note are to the portions of the codebase tightly
tied to
>> > thrift
>> > > > as a
>> > > > > > > technology:
>> > > > > > > + Gson/thrift converters
>> > > > > > > + Shiro annotated auth param interception
>> > > > > > > + Thrift/Servlet binding
>> > > > > > >
>> > > > > > > [1]
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
>> > > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987
>> > > > > > > [3]
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
>> > > > > > > [4] https://github.com/facebook/swift
>> > > > > > > [5] https://github.com/square/javapoet
>> > > > > > > [6] https://github.com/google/compile-testing
>> > > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Zameer Manji
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


-- 
John Sirois
303-512-3301

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message