impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Internal Jenkins (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks
Date Fri, 13 May 2016 04:23:40 GMT
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks
......................................................................


IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks

This patch modifies HdfsTextScanner to specifically check for split
"\r\n" delimiters when the scan range ends with '\r'. If there does
turn out to be a split delimiter, the next tuple is considered the
responsibility of the next scan range's scanner, as if the delimiter
appeared fully in the second scan range. This should not affect the
overall performance characteristics of the text scanner since it
already must do a remote read past the end of the scan range to read
the last tuple.

Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
Reviewed-on: http://gerrit.cloudera.org:8080/2803
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
---
M be/src/exec/delimited-text-parser.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners.py
4 files changed, 171 insertions(+), 16 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/2803
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message