impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Ivanfi (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Date Wed, 31 Aug 2016 13:37:26 GMT
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3973: add position and occurrence to instr()
......................................................................


Patch Set 4:

> As we discussed in person our internal strings aren't \0
 > terminated. Let's try using StringFunctions::Reverse() and the
 > InStr implementation we already have and see how fast that is.

To recap our conversation in a little more detail, using the std::string constructor that
takes a length parameter in addition to the char* param would solve the problem that StringValues
are not NULL-terminated, but you were also worried about the copying and the standard search
implementation.

The copying could be easily avoided by using std::search with reverse iterators, as suggested
here: http://stackoverflow.com/a/1634416 and creating the reverse iterators directly on the
char*, like suggested here: http://stackoverflow.com/a/22488428.

In my opinion this would result in the cleanest solution, but you were also worried about
using the standard C++ search, which is around four times slower than our internal search
algorithm that unfortunately only supports searching forwards.

According to the comments in the source code of our internal search algorithm, it is based
on Python's search. I checked this latter and found that reverse search have been added to
it recently. Since our implementations have diverged significantly, it is not possible to
use the new Python search code, but it can be seen that the reverse search functionality is
a trivial modification of the regular search, so I added a reverse search variant as well.
I also added tests for it as well as for our regular search, which did not have tests. The
name of the new test collided with the name of an existing test that was misnamed, so I renamed
that. Finally I implemented instr() on top of these bloom-filter-based search algorithms.

-- 
To view, visit http://gerrit.cloudera.org:8080/4094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi <zi@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi@cloudera.com>
Gerrit-HasComments: No

Mime
View raw message