impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Date Mon, 06 Feb 2017 19:12:40 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()

Patch Set 11:


Please also add exprs.test as discussed offline to cover cases in which pattern or replace
are non-constant.
File be/src/exprs/

PS9, Line 290: 
nit: indent 4.

PS9, Line 291: t, buffer_space)
Please see comments from previous patch. Isn't this bytes_produced computed above ?

PS9, Line 305: _LE(ptr - resul
Same as bytes_remaining above ?
File be/src/exprs/

PS11, Line 227: replace = new ReplaceContext(pattern);
Sorry, I may have misunderstood your question about this. We need to call context->Allocate()
here in order to track the memory consumption. The general guideline is to avoid untracked
memory usage as much as possible.

PS11, Line 236: delete rptr;
Please use context->Free().

PS11, Line 274: (delta > 0 && delta < 128)
parenthesis seems unnecessary here.

PS11, Line 274: 128
Mind documenting how the number 128 is derived ? In other words, why not 256, 512, or 64 ?

PS11, Line 282: (replace.len - pattern.len)
Just use 'delta' for clarity.

PS11, Line 288: haystack.len - pattern.len + replace.len
For clarity, can you please use haystack.len + delta ?

PS11, Line 322:        const int bytes_remaining = haystack.len - consumed;
This seems to overlap with the remaining_bytes in line 352. How about we hoist it out of the
loop like the following ?

// number of bytes to match in the original string
int bytes_remaining = haystack.len - consumed;
while (bytes_remaining >= pattern.len) {

   consumed = match_pos;
   bytes_remaining = haystack.len - consumed;

   if (delta > 0) {


PS11, Line 335: it's
nit: its

PS11, Line 337:           static_assert(BitUtil::IsPowerOf2(StringVal::MAX_LENGTH),
              :               "buffer_space to not exceed MAX_LENGTH requires it to be a power
of 2");
I am not sure I understand the purpose of this assert completely. Do we rely on it for line
331 above to be effective ?

Why don't we move line 331 to to the point after line 343 ? This seems to be easier to follow
and I am not sure just assuming the resize to succeed is a good thing. We can fail to resize
for other reasons (e.g. exceeding memory limit set on a query).

PS11, Line 342: const auto ofs = ptr - result.ptr;
Isn't this the same as bytes_produced above ?

PS11, Line 344: DCHECK_EQ(resized, true);
As mentioned above, this may not hold all the time as we can exceed memory limit even if buffer_space
<= StringVal::MAX_LENGTH;

PS11, Line 352: const int remaining_bytes = haystack.len - consumed;
If you take the suggestion above to hoist bytes_remaining out of the while loop, you don't
need this line.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Reviewer: Zach Amsden <>
Gerrit-HasComments: Yes

View raw message