impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Skye Wanderman-Milne (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks
Date Fri, 06 May 2016 00:07:29 GMT
Hello Dan Hecht, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2803

to look at the new patch set (#6).

Change subject: IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks
......................................................................

IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks

This patch modifies HdfsTextScanner to specifically check for split
"\r\n" delimiters when the scan range ends with '\r'. If there does
turn out to be a split delimiter, the next tuple is considered the
responsibility of the next scan range's scanner, as if the delimiter
appeared fully in the second scan range. This should not affect the
overall performance characteristics of the text scanner since it
already must do a remote read past the end of the scan range to read
the last tuple.

Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
---
M be/src/exec/delimited-text-parser.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners.py
4 files changed, 171 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/03/2803/6
-- 
To view, visit http://gerrit.cloudera.org:8080/2803
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message