impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Date Fri, 02 Sep 2016 22:44:56 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
......................................................................


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS6, Line 311: search_start_pos < 0
remove- this is not possible since we know it's not 0 from l302 and we know it's not <0
from l308. it might be worthwhile to put a DCHECK above l316 to make sure it's always >=
0 though.


PS6, Line 311: {
             :       return IntVal(0);
this will be short enough to return on the same line


PS6, Line 318:       if (match_pos_in_substring == -1) {
             :         return IntVal(0);
             :       }
1 line


PS6, Line 322: search_start_pos
I think you also need to make sure search_start_pos is < haystack.len before the next call
to Substring (l316), otherwise that will return invalid memory (Substring doesn't check the
length).

In fact, since you'll have to do the check before calling Substring(search_start_pos) on l316,
I think you can remove the entire block on l311 that checks search_start_pos.


PS6, Line 330:  search_start_pos >= str.len
This isn't possible, str.len = haystack.len, and we know we added something less than 0 to
it to get search_start_pos


PS6, Line 330:     if (search_start_pos < 0 || search_start_pos >= str.len) {
             :       return IntVal(0);
             :     }
remove


PS6, Line 336: search_start_pos + needle.len
why do you add the needle len?
I don't understand why it's not just search_start_pos.


PS6, Line 338:       if (match_pos == -1) {
             :         return IntVal(0);
             :       }
1 line


http://gerrit.cloudera.org:8080/#/c/4094/6/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

PS6, Line 102:     for (int i = 0; i < mlast; ++i) {
             :       BloomAdd(pattern_->ptr[i]);
             :       if (pattern_->ptr[i] == pattern_->ptr[mlast])
             :         skip_ = mlast - i - 1;
             :     }
             :     BloomAdd(pattern_->ptr[mlast]);
I don't think this will always work for RSearch, see https://hg.python.org/cpython/file/6b6c79eba944/Objects/stringlib/fastsearch.h


l195

I think this would need to be computed in reverse.


-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <zi@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message