impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Skye Wanderman-Milne (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks
Date Sat, 16 Apr 2016 00:15:17 GMT
Skye Wanderman-Milne has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/2803

Change subject: IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks
......................................................................

IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks

This patch modifies HdfsTextScanner to specifically check for split
"\r\n" delimiters when the scan range ends with '\r'. If there does
turn out to be a split delimiter, the next tuple is considered the
responsibility of the next scan range's scanner, as if the delimiter
appeared fully in the second scan range. This should not affect the
overall performance characteristics of the text scanner since it
already must do a remote read past the end of the scan range to read
the last tuple.

Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners.py
3 files changed, 123 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/03/2803/1
-- 
To view, visit http://gerrit.cloudera.org:8080/2803
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <skye@cloudera.com>

Mime
View raw message