Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 26AE4200C23 for ; Wed, 8 Feb 2017 01:04:27 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 25537160B68; Wed, 8 Feb 2017 00:04:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6BB1A160B3E for ; Wed, 8 Feb 2017 01:04:26 +0100 (CET) Received: (qmail 14485 invoked by uid 500); 8 Feb 2017 00:04:25 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 14474 invoked by uid 99); 8 Feb 2017 00:04:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Feb 2017 00:04:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CBF3AC07E2 for ; Wed, 8 Feb 2017 00:04:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.462 X-Spam-Level: **** X-Spam-Status: No, score=4.462 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_SBL=4, URIBL_SBL_A=0.1] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id n99KSvfFhaQO for ; Wed, 8 Feb 2017 00:04:24 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 5960C5F613 for ; Wed, 8 Feb 2017 00:04:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v1804EMg014686; Wed, 8 Feb 2017 00:04:15 GMT Message-Id: <201702080004.v1804EMg014686@ip-10-146-233-104.ec2.internal> Date: Wed, 8 Feb 2017 00:04:13 +0000 From: "Zach Amsden (Code Review)" To: Michael Ho , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Alex Behm , Dan Hecht , Tim Armstrong Reply-To: zamsden@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4729=3A_Implement_REPLACE=28=29=0A?= X-Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 X-Gerrit-ChangeURL: X-Gerrit-Commit: 6462e36fe3a6bb2ae27000be7dd06f4a25bd6163 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Wed, 08 Feb 2017 00:04:27 -0000 Hello Michael Ho, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5776 to look at the new patch set (#18). Change subject: IMPALA-4729: Implement REPLACE() ...................................................................... IMPALA-4729: Implement REPLACE() This turned out to be slightly non-trivial as REPLACE is already a keyword, and thus the parser needs to be tweaked to allow this, since function names act as bare identifiers. It was difficult to get this to match performance of regexp_replace. For expanding patterns, the fact that regexp_replace copies the expansion inline means that it may in fact win on large strings with sparse matches that are > dcache size apart. Let's leave optimizing that for later. Testing: Added a full test for maximum size strings and got most of the boundary conditions I could identify. Manually ran queries on TPC-H dataset in impala to verify both performance and correctness. Added large string and exprs.test test clauses and ran the tests to verify they work as expected. Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/gutil/bits.h M be/src/udf/udf-internal.h M be/src/udf/udf-test-harness.cc M be/src/udf/udf-test-harness.h M be/src/udf/udf.cc M be/src/udf/udf.h M common/function-registry/impala_functions.py M fe/src/main/cup/sql-parser.cup M testdata/workloads/functional-query/queries/QueryTest/exprs.test M testdata/workloads/functional-query/queries/QueryTest/large_strings.test 13 files changed, 396 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/5776/18 -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden