Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3D98D2004A0 for ; Wed, 16 Aug 2017 20:52:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3C2F1169631; Wed, 16 Aug 2017 18:52:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 36C3C16962C for ; Wed, 16 Aug 2017 20:52:05 +0200 (CEST) Received: (qmail 95352 invoked by uid 500); 16 Aug 2017 18:52:03 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 95327 invoked by uid 99); 16 Aug 2017 18:52:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Aug 2017 18:52:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 116DBC0042 for ; Wed, 16 Aug 2017 18:52:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Lk9wa3_0kwx3 for ; Wed, 16 Aug 2017 18:52:02 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id B82B45F6D2 for ; Wed, 16 Aug 2017 18:52:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v7GIq04Q024680; Wed, 16 Aug 2017 18:52:00 GMT Message-Id: <201708161852.v7GIq04Q024680@ip-10-146-233-104.ec2.internal> Date: Wed, 16 Aug 2017 18:52:00 +0000 From: "Tim Armstrong (Code Review)" To: Tianyi Wang , impala-cr@cloudera.com, reviews@impala.incubator.apache.org Reply-To: tarmstrong@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5573=3A_Add_decimal_codegen_in_text_scanner=0A?= X-Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d X-Gerrit-ChangeURL: X-Gerrit-Commit: 5bb3e468653589b97eeaaf6bd4b4bed771d14624 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Wed, 16 Aug 2017 18:52:06 -0000 Tim Armstrong has posted comments on this change. Change subject: IMPALA-5573: Add decimal codegen in text scanner ...................................................................... Patch Set 3: (8 comments) Code change looks good, just a few cleanup requests. http://gerrit.cloudera.org:8080/#/c/7683/2//COMMIT_MSG Commit Message: Line 14: StringToDecimal out of LLVM IR. Can you add a line break to separate the paragraphs to make it more readable. PS2, Line 15: libUtil.so nit: libUtil, since we use static linking too. Line 19: Did you do an experiment to see how much scanning sped up for a larger table? tpch.lineitem has a few decimal columns. I often copy the data a few times for scanner tests like this to get good measurements. create table biglineitem as select * from lineitem; insert into biglineitem select * from lineitem; insert into biglineitem select * from lineitem; I think you'll see the biggest speedup if you have a condition in the where clause, because codegen of that condition also gets disabled. E.g. select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0; http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: Line 134: > I'm sorry it works without this. I have no idea how I came to that conclusi Good to know :) I was worried something subtle was going on. There are some places in the code where we resort to hacks to force modules to be linked into the output binary: https://github.com/apache/incubator-impala/blob/294d42adc117046f975665834af03ddaa53ec27e/be/src/exprs/scalar-expr-evaluator.cc#L428 Once the functions are linked in LLVM by default falls back to resolving functions in the binary's dynamic symbol table. Line 134: > Without this it crashes with "Program used external function ... which coul Maybe when you rebuilt Impala you hadn't updated the CMakeLists.txt, or the change hadn't taken affect. That would explain this. http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/exec/hdfs-scanner-ir.cc File be/src/exec/hdfs-scanner-ir.cc: Line 131: if (*result != ParseResult::PARSE_SUCCESS) *result = ParseResult::PARSE_FAILURE; > The IR generated by Clang still reads decimalblah and we will need a bitcas Ah ok, makes sense. I think it would also be ok to return Decimal*Value and bitcast unconditionally, but this works for me. http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/exec/text-converter.cc File be/src/exec/text-converter.cc: Line 234: case 16: > Done. This is copied from TextConverter::WriteSlot and there is an unnecess Oh interesting :) Not sure what that was there. http://gerrit.cloudera.org:8080/#/c/7683/3/be/src/exec/text-converter.cc File be/src/exec/text-converter.cc: PS3, Line 292: proper_slot maybe "cast_slot". It's unclear what "proper" means. -- To view, visit http://gerrit.cloudera.org:8080/7683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes