impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Date Tue, 24 Jan 2017 06:38:20 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()

Patch Set 2:

Commit Message:

Line 11: since function names act as bare identifiers.
Can you also run a basic perf test to compare the speed of regex_replace() vs. replace()?
The motivation for replace() is to be significantly faster in simple cases than the next best
alternative. You could try something along the lines of:

select count(replace(l_comment, " ", "")) from tpch_parquet.lineitem;

but probably with a larger lineitem dataset. Please reach out if you need help creating a
larger dataset.

Might also be interesting to see how a stupid version with std::replace() would compare.
File be/src/exprs/

Line 1954:   TestStringValue("replace('abc', 'abcd', 'a')", "abc");
suggest these additional tests:
1. nonsense replace
replace("abcdefg", "z", "abcdefg");
2. empty input string
replace("", "z", "");
3. input string is NULL, rest non-NULL
TestStringValue("replace(NULL, 'abc', 'a')", "abc");
4. output string is bigger than the input string
TestStringValue("replace('aaa', 'a', 'aa')", "aaaaaa");
5. add test with exactly 16 matches and some trailing chars in the input that are non-matches
File be/src/exprs/

Line 210:   if (pattern.len == 0) return str;
pattern can be null as well

Line 218:   const int kMemoizedMatches = 16;
style: k_memoized_matches or max_memoized_matches

add comment: why this memoization? why only 16?

Line 225:       if (match_pos_in_substring < 0)
single line

Line 237:   const int rlen = num_matches * (replace.len - needle.len) + str.len;
This could overflow and lead to writing past the result buffer.

Suggest using a larger result type and add a test. The SQL function "repeat" might come in

Line 249:       DCHECK(match_pos_in_substring >= 0);
Must there really always be a match here?

(also DCHECK_GE)

Line 253:     int bytes = match_pos - consumed;

Line 266:   memcpy(ptr, &haystack.ptr[match_pos], str.len - consumed);
// Copy trailing non-matches.
File fe/src/main/cup/sql-parser.cup:

Line 3050: ident_or_restricted ::=
Agree about the issues.

Why not handle the REPLACE case like IF and TRUNCATE? Have a look at the non_pred_expr production.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-HasComments: Yes

View raw message