impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2700: ASCII NUL characters are doubled on insert into text tables
Date Fri, 22 Jul 2016 20:22:25 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2700: ASCII NUL characters are doubled on insert into text tables
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3703/1/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

PS1, Line 208: str_val->ptr[i] == field_delim_
I don't think we want to escape field delimiters if the escape char is '\0'.


Line 208:     if (UNLIKELY(str_val->ptr[i] == field_delim_ || (str_val->ptr[i] == escape_char_
&&
As discussed, I think we should separate out the escaped and unescaped code paths - it'll
be easier to follow and perform better. As-is we're checking the characters one-by-one for
the escape character and delimiter, then ignoring them once we find them.

I.e.

if (escape_char_ == '\0') {
  ... Just copy str_val into the string verbatim ...
} else {
  ... Run the existing code ...
}


-- 
To view, visit http://gerrit.cloudera.org:8080/3703
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia30fa314d1ee1e99f9e7598466eb1570ca7940fc
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: anujphadke <aphadke@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message