drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From weijie tong <tongweijie...@gmail.com>
Subject Re: Which code compiler is better
Date Mon, 31 Jul 2017 15:31:41 GMT
here is JIRA link :  https://issues.apache.org/jira/browse/DRILL-5696

Our product environment is using JDK 8 , transformed code generation (not
plain java). Paul's experiments verified our product case.

The sql is like : "select
 (d.trade_cnt - d2.trade_cnt)/CAST(d2.trade_cnt AS DECIMAL(28,4)) as
trade_cnt_wr

,(d.trade_amt - d2.trade_amt)/CAST(d2.trade_amt AS DECIMAL(28,4)) as
trade_amt_wr
,(d.trade_shop_cnt - d2.trade_shop_cnt)/CAST(d2.trade_shop_cnt AS
DECIMAL(28,4)) as trade_shop_cnt_wr
,(d.online_shop_cnt - d2.online_shop_cnt)/CAST(d2.online_shop_cnt AS
DECIMAL(28,4)) as online_shop_cnt_wr
,CAST((d.trade_shop_rate - d2.trade_shop_rate) AS
DECIMAL(28,4))/CAST(d2.trade_shop_rate AS DECIMAL(28,4)) as
trade_shop_rate_wr
,(d.offline_item_cnt - d2.offline_item_cnt)/CAST(d2.offline_item_cnt
AS DECIMAL(28,4)) as offline_item_cnt_wr
,(d.business_amt_per_cnt -
d2.business_amt_per_cnt)/CAST(d2.business_amt_per_cnt AS
DECIMAL(28,4)) as business_amt_per_cnt_wr
,(d.order_amt_per_cnt -
d2.order_amt_per_cnt)/CAST(d2.order_amt_per_cnt AS DECIMAL(28,4)) as
order_amt_per_cnt_wr
,(d.new_shop_cnt - d2.new_shop_cnt)/CAST(d2.new_shop_cnt AS
DECIMAL(28,4)) as new_shop_cnt_wr
,(d.offline_shop_cnt - d2.offline_shop_cnt)/CAST(d2.offline_shop_cnt
AS DECIMAL(28,4)) as offline_shop_cnt_wr
,(d.item_use_cnt - d2.item_use_cnt)/CAST(d2.item_use_cnt AS
DECIMAL(28,4)) as item_use_cnt_wr
,(d.item_shop_rate - d2.item_shop_rate)/CAST(d2.item_shop_rate AS
DECIMAL(28,4)) as item_shop_rate_wr
,(d.discount_trd_cnt - d2.discount_trd_cnt)/CAST(d2.discount_trd_cnt
AS DECIMAL(28,4)) as discount_trd_cnt_wr
,(d.discount_shop_cnt -
d2.discount_shop_cnt)/CAST(d2.discount_shop_cnt AS DECIMAL(28,4)) as
discount_shop_cnt_wr
,(d.crm_shop_cnt - d2.crm_shop_cnt)/CAST(d2.crm_shop_cnt AS
DECIMAL(28,4)) as crm_shop_cnt_wr
,(d.crm_shop_rate - d2.crm_shop_rate)/CAST(d2.crm_shop_rate AS
DECIMAL(28,4)) as crm_shop_rate_wr
,(d.trade_cnt_voucher -
d2.trade_cnt_voucher)/CAST(d2.trade_cnt_voucher AS DECIMAL(28,4)) as
trade_cnt_voucher_wr
,(d.trade_amt_voucher -
d2.trade_amt_voucher)/CAST(d2.trade_amt_voucher AS DECIMAL(28,4)) as
trade_amt_voucher_wr
,(d.trade_cnt_per_shop -
d2.trade_cnt_per_shop)/CAST(d2.trade_cnt_per_shop AS DECIMAL(28,4)) as
trade_cnt_per_shop_wr

  from
    xxxxx   "

The project operator's setup time is high when using janino compiler.






On Mon, Jul 31, 2017 at 10:54 AM, Paul Rogers <progers@mapr.com> wrote:

> A while back I did some experiments with JDK 8. The Java 8 compiler
> appears to be faster in general than Janino, if I remember correctly. (Not
> surprising: many people focus on optimizing the Java compiler, a smaller
> team maintains Janino...)
>
>
> Another experiment was to do "plain Java" code generation and compile
> rather than the compile & byte-code merge we do now. The compilation was
> faster as was code execution. The main reason for the speed-up is that
> "plan Java" does fewer steps: it just compiles and loads. However,
> "traditional" Drill code generation compiles, does a byte code copy and
> merge and then loads. Some "templates" are rather large. By using "plain
> Java" subclassing, we need not copy the base class code as we do when doing
> the byte-code merge.
>
>
> Also, because each generated class (with plain Java) uses the same base
> class code, the JVM can reuse its JIT optimizations; it does not have to
> rediscover them for each new generated class.
>
>
> We've not had time to do full testing, so we conservatively stick with
> what we know works. Still,  preliminary testing did show that "plain Java"
> is both faster and more convenient. You can experiment with this option.
> Find the commented out line like the following in each operator (record
> batch):
>
>
>       // Uncomment out this line to debug the generated code.
>
> //    cg.saveCodeForDebugging(true);
>
> Uncomment the line. You'll get debuggable plain Java code and compilation.
> The generated source code goes into /tmp/drill/codegen by default.
>
> Or, if you want to try for performance, and avoid the step of writing code
> to disk, use the following instead:
>
>     cg.preferPlainJava(true);
>
> More details appears in [1].
>
> Thanks,
>
> - Paul
>
> [1] https://github.com/paul-rogers/drill/wiki/Code-
> Generation-and-%22Short%2C-Fat%22-Queries
> ________________________________
> From: Aman Sinha <amansinha@apache.org>
> Sent: Sunday, July 30, 2017 9:16:09 AM
> To: dev@drill.apache.org
> Subject: Re: Which code compiler is better
>
> Weijie,
> what is the size (in KB) of your generated code for the aggregate operator
> that is doing the 20 SUM/AVG ?  Also, what JDK version are you using ?
>
> From what I recall, Janino was faster than JDK 1.7  up to about 256 KB
> source code file.  That's the current threshold in Drill; if the size is
> greater than that, Drill automatically switches to JDK compiler.   Newer
> JDK could potentially be faster, so we would need to do the comparison
> again.   Perhaps you should file a JIRA with your observations.
>
> There is also the complexity of the expressions.  For simple expressions,
> my understanding is Janino is typically better.  Can you provide the query
> pattern you used ?
>
> On Sun, Jul 30, 2017 at 6:10 AM, weijie tong <tongweijie178@gmail.com>
> wrote:
>
> > The compile process is long when we have 20 sum or avg expression and the
> > compiler is janino. But if we change the compiler to jdk,we gain lower
> > compile process time. It seems jdk compiler is better .If that's tue,why
> > not let jdk be the default one?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message