impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Date Mon, 06 Feb 2017 19:12:40 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
......................................................................


Patch Set 11:

(15 comments)

Please also add exprs.test as discussed offline to cover cases in which pattern or replace
are non-constant.

http://gerrit.cloudera.org:8080/#/c/5776/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS9, Line 290: 
nit: indent 4.


PS9, Line 291: t, buffer_space)
Please see comments from previous patch. Isn't this bytes_produced computed above ?


PS9, Line 305: _LE(ptr - resul
Same as bytes_remaining above ?


http://gerrit.cloudera.org:8080/#/c/5776/11/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS11, Line 227: replace = new ReplaceContext(pattern);
Sorry, I may have misunderstood your question about this. We need to call context->Allocate()
here in order to track the memory consumption. The general guideline is to avoid untracked
memory usage as much as possible.


PS11, Line 236: delete rptr;
Please use context->Free().


PS11, Line 274: (delta > 0 && delta < 128)
parenthesis seems unnecessary here.


PS11, Line 274: 128
Mind documenting how the number 128 is derived ? In other words, why not 256, 512, or 64 ?


PS11, Line 282: (replace.len - pattern.len)
Just use 'delta' for clarity.


PS11, Line 288: haystack.len - pattern.len + replace.len
For clarity, can you please use haystack.len + delta ?


PS11, Line 322:        const int bytes_remaining = haystack.len - consumed;
This seems to overlap with the remaining_bytes in line 352. How about we hoist it out of the
loop like the following ?

// number of bytes to match in the original string
int bytes_remaining = haystack.len - consumed;
while (bytes_remaining >= pattern.len) {
   ...
   ...

   consumed = match_pos;
   bytes_remaining = haystack.len - consumed;
   ....
   .....

   if (delta > 0) {
       ....
   }

}


PS11, Line 335: it's
nit: its


PS11, Line 337:           static_assert(BitUtil::IsPowerOf2(StringVal::MAX_LENGTH),
              :               "buffer_space to not exceed MAX_LENGTH requires it to be a power
of 2");
I am not sure I understand the purpose of this assert completely. Do we rely on it for line
331 above to be effective ?

Why don't we move line 331 to to the point after line 343 ? This seems to be easier to follow
and I am not sure just assuming the resize to succeed is a good thing. We can fail to resize
for other reasons (e.g. exceeding memory limit set on a query).


PS11, Line 342: const auto ofs = ptr - result.ptr;
Isn't this the same as bytes_produced above ?


PS11, Line 344: DCHECK_EQ(resized, true);
As mentioned above, this may not hold all the time as we can exceed memory limit even if buffer_space
<= StringVal::MAX_LENGTH;


PS11, Line 352: const int remaining_bytes = haystack.len - consumed;
If you take the suggestion above to hoist bytes_remaining out of the while loop, you don't
need this line.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <zamsden@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zamsden@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message