impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-1740: Add support for skip.header.line.count.
Date Wed, 27 Apr 2016 20:27:04 GMT
Hello Marcel Kornacker, Internal Jenkins, Skye Wanderman-Milne,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2110

to look at the new patch set (#23).

Change subject: IMPALA-1740: Add support for skip.header.line.count.
......................................................................

IMPALA-1740: Add support for skip.header.line.count.

HIVE-5795 introduced a parameter skip.header.line.count to skip header
lines from input files. This change introduces the capability to skip
an arbitrary number of header lines from csv input files on hdfs. The
size of the total file header must be smaller than
max_scan_range_length, otherwise an error will be reported. This is
necessary because scan ranges are not read in disk order, so there is
no way of identifying header lines except by counting from the start
of the first scan range.

[localhost:21000] > alter table t1 set
tblproperties('skip.header.line.count'='1');
Query: alter table t1 set tblproperties('skip.header.line.count'='1')
[localhost:21000] > select * from t1;
Query: select * from t1
+----+----+
| c1 | c2 |
+----+----+
| 1  | 1  |
| 2  | 2  |
| 3  | 3  |
+----+----+
Fetched 3 row(s) in 0.32s
[localhost:21000] > alter table t1 set
tblproperties('skip.header.line.count'='0');
Query: alter table t1 set tblproperties('skip.header.line.count'='0')
[localhost:21000] > select * from t1;
Query: select * from t1
+------+------+
| c1   | c2   |
+------+------+
| NULL | NULL |
| 1    | 1    |
| 2    | 2    |
| 3    | 3    |
+------+------+
WARNINGS: Error converting column: 0 TO INT (Data is: num1)
Error converting column: 1 TO DOUBLE (Data is: num2)
file: hdfs://localhost:20500/test-warehouse/t1/test.txt
record: num1,num2

Fetched 4 row(s) in 0.41s

Change-Id: I595f01a165d41499ca1956fe748ba3840a6eb543
---
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/hdfs-text-table-writer.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/com/cloudera/impala/analysis/BaseTableRef.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
A testdata/data/table_with_header.csv
A testdata/data/table_with_header_2.csv
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/create.test
A testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M tests/query_test/test_scanners.py
25 files changed, 456 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/10/2110/23
-- 
To view, visit http://gerrit.cloudera.org:8080/2110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I595f01a165d41499ca1956fe748ba3840a6eb543
Gerrit-PatchSet: 23
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Skye Wanderman-Milne <skye@cloudera.com>

Mime
View raw message