Return-Path: X-Original-To: apmail-aurora-dev-archive@minotaur.apache.org Delivered-To: apmail-aurora-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4DE8A18E73 for ; Thu, 28 Jan 2016 01:34:11 +0000 (UTC) Received: (qmail 97353 invoked by uid 500); 28 Jan 2016 01:34:03 -0000 Delivered-To: apmail-aurora-dev-archive@aurora.apache.org Received: (qmail 97296 invoked by uid 500); 28 Jan 2016 01:34:03 -0000 Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list dev@aurora.apache.org Received: (qmail 97284 invoked by uid 99); 28 Jan 2016 01:34:02 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jan 2016 01:34:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 79B5FC2D19 for ; Thu, 28 Jan 2016 01:34:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.652 X-Spam-Level: *** X-Spam-Status: No, score=3.652 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_NEUTRAL=0.652, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=conductant-com.20150623.gappssmtp.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 2ZxOFm8mq1d0 for ; Thu, 28 Jan 2016 01:33:51 +0000 (UTC) Received: from mail-ob0-f171.google.com (mail-ob0-f171.google.com [209.85.214.171]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 0699B429EF for ; Thu, 28 Jan 2016 01:33:50 +0000 (UTC) Received: by mail-ob0-f171.google.com with SMTP id is5so23498038obc.0 for ; Wed, 27 Jan 2016 17:33:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=conductant-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=0ljgOwub5vtEtW7PiXmQR6x7T0aTKEJadAKEYQs9XtI=; b=RBNRM7Mr2hSCMGMCPZptkZQcErSWHmLlnynFzbBWe73dWteivFky8eP1vgt7EQIARV PmS8/hKIRzw+hSdGhioFzz4x2WnAfAgW8PpSWCBAlgxuOBluEuEVUuMhbNly295Rgg6S +wLwd7mtHxFjPVQrJdrumpwnj4XttzobxNtxfg12xVyiqYNrTgl2hZRCEWOobNZaGRP5 965j8W6yyCaaTF9Wkz46RtQA1mzWX4x+h+amHO8i9q8itWYeaUG0huQ/dpRjN6R2p0JL z28kybVr8fmHU5QFVf/RQlcqopq+WC1DRPoPaH6Vg7bpkzTtOtaSaunZG4KAqeNvHmkZ 9+Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=0ljgOwub5vtEtW7PiXmQR6x7T0aTKEJadAKEYQs9XtI=; b=Yn1LUfD0aW21qgQXvjpk4J7sGuqQNws0rfqRKLtQ0i3h4/bQajFlEX08uscA/9lBss nUZjkblwIeBJ2+67CyPzXVx54bhhHca446Fwqi9PFuy4zbN2o2O8lc+qyWc8wRDOknT9 uoaTwghwl3nFW/7ssbjcMd/VzXSzaVmmKuQeZHeOl55pdoDVlSiptMFxZpmJnkBSyb6/ 6joqOajsk+KaX1rg0m6hSgYFNW0mqWP5RFhnxB3CqvvUcGaQluhtXBLbz22fC828uifK OdQVyYvYg9MdjGkHAjF5gatLM1TtwaQejHZJmitMiBM6ej53gwWMMbKNlhZ/3jBFFWHb moDA== X-Gm-Message-State: AG10YOQwAWuWL984RTquqJrp15/YjZjKs4pvKKI4QJO1MrJ2Qd4IV9oj1xG+PJWhpzuPcPn4gG4kjMakFCVmRA== MIME-Version: 1.0 X-Received: by 10.60.141.193 with SMTP id rq1mr241020oeb.19.1453944824215; Wed, 27 Jan 2016 17:33:44 -0800 (PST) Received: by 10.60.92.73 with HTTP; Wed, 27 Jan 2016 17:33:44 -0800 (PST) X-Originating-IP: [166.173.60.5] Received: by 10.60.92.73 with HTTP; Wed, 27 Jan 2016 17:33:44 -0800 (PST) In-Reply-To: References: Date: Wed, 27 Jan 2016 18:33:44 -0700 Message-ID: Subject: Re: [PROPOSAL] Change java thrift code gen From: John Sirois To: dev@aurora.apache.org Cc: jsirois@apache.org Content-Type: multipart/alternative; boundary=047d7b450be8c562c9052a5ae87d --047d7b450be8c562c9052a5ae87d Content-Type: text/plain; charset=UTF-8 -- John Sirois 303-512-3301 On Jan 27, 2016 6:08 PM, "Maxim Khutornenko" wrote: > > I am +1 to making immutable thrift objects solely based on perf numbers. > > My biggest concern though is maintenance of a pretty intricate codebase, > especially when it comes to upgrading any of the frameworks involved. > Bill's suggestion to explore paths to make this a part of Apache Thrift > sounds great to me. > > On Wed, Jan 27, 2016 at 5:00 PM, Bill Farner wrote: > > > Tony - this would not be a technical fork so much as a spiritual fork. > > While we would have our own bugs to work out, the only upstream exposure > > would be IDL or wire format changes. > > > > On Wed, Jan 27, 2016 at 4:58 PM, Tony Dong > > wrote: > > > > > Awesome performance numbers! I don't particularly know the logistics of > > > upstreaming a change like this, but optimistically I would suggest > > > upstreaming it to Apache Thrift if possible. > > > > > > As someone that maintains a fork of a thrift compiler(fork of scrooge), I > > > have to say that it's not that fun. There's a lot of custom code that > > needs > > > to be maintained and a bunch of work to rebase the code periodically. > > > > > > > > > > > > On Wed, Jan 27, 2016 at 4:51 PM, Bill Farner wrote: > > > > > > > Firstly - thanks for the clean organization and delineation of steps in > > > > this change. Top notch work! > > > > > > > > Some of the performance improvements are very nice; and in a > > particularly > > > > hot code path. I will wager a guess that the majority of the savings > > is > > > in > > > > avoiding what amounts to copy constructors between mutable and > > immutable > > > > types. I further wager there are alternative approaches we could weigh > > > to > > > > achieve those performance improvements. As an example - you note above > > > > that we could provide a patch to Apache Thrift. Depending how much > > > > performance inspires our decision here, it will be prudent to evaluate > > > > alternatives. > > > > > > > > I think there are (at least) two major issues worth discussing - code > > > > volume (which you note) and an increase in logical complexity. This > > will > > > > leave us with a bifurcation in code generation tooling (custom+swift > > for > > > > Java, Apache Thrift for python and js). It's difficult to quantify the > > > > downside of that, but it seems like an unfortunate state with potential One note on the bifurcation is that it would be temporary since a 1st class JSON api (https://issues.apache.org/jira/browse/AURORA-987), means only the scheduler will use thrift internally - as its data model source of truth and ser/deser for the distributed log. > > > for > > > > compatibility risks. > > > > > > > > > > > > On Wed, Jan 27, 2016 at 2:45 PM, Zameer Manji > > wrote: > > > > > > > > > Some high level comments without looking at the code. > > > > > > > > > > I'm in favor from abandoning the thrift generated java code in favor > > of > > > > > immutable objects. I think it is easier to reason about and will > > ensure > > > > we > > > > > have less errors in our code. If I understand correctly, the ProtoBuf > > > > > format does this by default, so there some precedent for this style > > of > > > > code > > > > > generation already. > > > > > > > > > > I think using Facebook's swift is the best approach here. I would be > > > > > hesitant to accept any custom code generation that involved us > > parsing > > > > > thrift IDL files or thrift formats over the wire because I poses a > > very > > > > > high maintenance burden. > > > > > > > > > > I also think generating the MyBatis mutable classes is superior to > > our > > > > > current strategy of manually creating them. > > > > > > > > > > Finally, the performance improvements look fantastic. As an operator > > > of a > > > > > large cluster I am excited to see wholesale performance improvements > > > as I > > > > > am always concerned that my cluster is approaching the limits of what > > > > > Aurora can handle safely. > > > > > > > > > > Overall, I think this change merits a serious discussion from all > > > > > contributors. > > > > > > > > > > On Tue, Jan 26, 2016 at 9:19 PM, John Sirois > > > wrote: > > > > > > > > > > > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois > > > > wrote: > > > > > > > > > > > > > Context: Aurora uses the official Apache Thrift compiler today > > > plus a > > > > > > > home-grown python code generator [1] for immutable "entity" (I*) > > > > > > wrappers. > > > > > > > > > > > > > > The proposal is to switch from using the Apache Thrift code > > > generator > > > > > to > > > > > > a > > > > > > > home grown generator. The proposal comes with a concrete example > > > in > > > > > the > > > > > > > form of the actual RBs to effect this change: > > > > > > > 1. A custom java thrift code generator: > > > > > > > https://reviews.apache.org/r/42748/ > > > > > > > 2. A custom MyBatis binding code generator powered by 1 above: > > > > > > > https://reviews.apache.org/r/42749/ > > > > > > > 3. Integration of 1 & 2 above into the Aurora codebase: > > > > > > > https://reviews.apache.org/r/42756/ > > > > > > > > > > > > > > Since the RBs are large, I wanted to provide some extra context > > on > > > > the > > > > > > > idea at a higher level. I provide rationale, pros and cons below > > > for > > > > > > those > > > > > > > interested in the idea but wary of diving in on code review until > > > the > > > > > > idea > > > > > > > itself passes a sniff test. > > > > > > > Thanks in advance for your feedback - and if we get there - for > > > your > > > > > > > review effort. > > > > > > > > > > > > > > > > > > > I just added wfarner and zmanji as reviewers for the 3 RBs above > > > since > > > > > > they've expressed direct interest. Happy to add others, just speak > > > up > > > > or > > > > > > else just comment on the reviews as you see fit. > > > > > > I'll formally only submit if 1st this email thread reaches > > consensus, > > > > and > > > > > > second, reviews are approved. > > > > > > > > > > > > == > > > > > > > > > > > > > > In the course of an initial run at creating a first-class > > REST-like > > > > > > > scheduler interface [2] I came to the conclusion generating the > > > json > > > > > API > > > > > > > from the thrift one might be a good path. That idea has been > > > > scrapped > > > > > > with > > > > > > > community feedback, but an initial experiment in custom thrift > > > > code-gen > > > > > > for > > > > > > > java that accompanied that idea seemed worth pursuing for its own > > > > > > > independent benefits, chief among these being 1st class immutable > > > > > thrift > > > > > > > structs and the ability to leverage thrift annotations. > > > > > > > > > > > > > > Immutability: > > > > > > > The benefits of having an immutable by default data model are the > > > > > > standard > > > > > > > ones; namely, its trivial to reason about safety of concurrent > > > > > operations > > > > > > > on the data model, stability of collections containing data model > > > > > > entities > > > > > > > and it opens up straight-forward optimizations that are easy to > > > > reason > > > > > > > about. > > > > > > > An example optimization is caching hashCodes for the immutable > > > thrift > > > > > > > structs. This was done after comparing jmh benchmarks run > > against > > > > > master > > > > > > > and then again against the proposal branch. Perf was comparable > > - > > > > > within > > > > > > > 10% plus and minus depending on the benchmark, but with the > > > > > optimization > > > > > > > added many benchmarks showed pronounced improvement in the > > proposal > > > > > > branch > > > > > > > [3]. The optimization is clearly safe and was quick and easy to > > > > > > > implement. Further optimizations can be experimented with in a > > > > > > > straightforward way. > > > > > > > > > > > > > > Thrift Annotations: > > > > > > > The thrift IDL grammar has supported these for quite some time, > > but > > > > > they > > > > > > > are not plumbed to the generated java code. Uses are many and > > > > > varied. I > > > > > > > initially had my eye on annotation of thrift services with REST > > > > verbs, > > > > > > > routes, etc - but immediately we can leverage these annotations > > to > > > > kill > > > > > > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding > > code > > > > that > > > > > > > needs to be maintained. > > > > > > > > > > > > > > There are a few downsides to switching to our own java thrift > > code > > > > gen: > > > > > > > 1. We own more code to maintain: Even though we have the custom > > > > python > > > > > > > "immutable" wrapper generator [1] today, this new generator - > > even > > > > with > > > > > > the > > > > > > > python generator removed - represents a 5-6x increase in line > > count > > > > of > > > > > > > custom code (~4.1k lines of code and tests in the new custom gen, > > > > ~700 > > > > > > > lines in the existing python custom gen) > > > > > > > 2. We conceptually fork from a sibling Apache project. > > > > > > > > > > > > > > The fork could be mitigated by turning our real experience > > > iterating > > > > > the > > > > > > > custom code generator into a well-founded patch back into the > > > Apache > > > > > > Thrift > > > > > > > project, but saying we'll do that is easier than following > > through > > > > and > > > > > > > actually doing it. > > > > > > > > > > > > > > == > > > > > > > Review guide / details: > > > > > > > > > > > > > > The technology stack: > > > > > > > The thrift IDL parsing and thrift wire parsing are both handled > > by > > > > the > > > > > > > Facebook swift project [4]. We only implement the middle bit > > that > > > > > > > generates java code stubs. This gives higher confidence that the > > > > > tricky > > > > > > > bits out at either edge are done right. > > > > > > > The thrift struct code generation is done using Square's javapoet > > > [5] > > > > > in > > > > > > > favor of templating for the purpose of easier to read generator > > > code. > > > > > > This > > > > > > > characterization is debatable though and template are certainly > > > more > > > > > > > flexible the minute you need to gen a second language (say we > > like > > > > this > > > > > > and > > > > > > > want to do javascript codegen this way too for example). > > > > > > > The MyBatis codegen is forced by the thrift codegen for technical > > > > > > > reasons. In short, there is no simple way to teach MyBatis to > > read > > > > and > > > > > > > write immutable objects with builders. So the MyBatis code is > > > > > generated > > > > > > > via an annotation processor that runs after thrift code gen, but > > > > > reading > > > > > > > thrift annotations that survive that codegen process. > > > > > > > The codegen unit testing is done with the help of Google's > > > > > compile-tester > > > > > > > [6]. NB that this has an expected output comparison that checks > > > the > > > > > > > generated AST and not the text, so its fairly lenient. > > Whitepsace > > > > and > > > > > > > comments certainly don't matter. > > > > > > > > > > > > > > Review strategy: > > > > > > > The code generator RBs (#1 & #2 in the 3 part series) are > > probably > > > > > easier > > > > > > > to review looking at samples of the generated code. Both the > > > thrift > > > > > > > codegen and MyBatis codegen samples are conveniently contained in > > > the > > > > > > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/). > > The > > > > > unit > > > > > > > test uses resource files that contain both the thrift codegen > > > inputs > > > > > the > > > > > > > annotation processor runs over and the annotation processor > > > expected > > > > > > > outputs - the MyBatis peer classes. So have a look there if you > > > > need > > > > > > > something concrete and don't want to patch the RBs in and > > actually > > > > run > > > > > > the > > > > > > > codegen (`./gradlew api:compileJava`). > > > > > > > The conversion RB (#3) is large but the changes are mainly > > > mechanical > > > > > > > conversions from the current mutable thrift + I* wrappers to pure > > > > > > immutable > > > > > > > thrift mutated via `.toBuilder` and `.with`'er methods. The main > > > > > changes > > > > > > > of note are to the portions of the codebase tightly tied to > > thrift > > > > as a > > > > > > > technology: > > > > > > > + Gson/thrift converters > > > > > > > + Shiro annotated auth param interception > > > > > > > + Thrift/Servlet binding > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py > > > > > > > [2] https://issues.apache.org/jira/browse/AURORA-987 > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346 > > > > > > > [4] https://github.com/facebook/swift > > > > > > > [5] https://github.com/square/javapoet > > > > > > > [6] https://github.com/google/compile-testing > > > > > > > > > > > > > > > > > > > -- > > > > > > Zameer Manji > > > > > > > > > > > > > > > > > > > > > > > > > > --047d7b450be8c562c9052a5ae87d--