impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Vissapragada (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
Date Thu, 19 May 2016 10:26:47 GMT
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
......................................................................


Patch Set 1:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/3045/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
> I think a more appropriate title would be something lile:
Done


Line 9: Bug: Impalads crash if we query a partitioned table with multiple file
> Description seems too complicated. I think Impalads will crash when queryin
Done


Line 12: Cause: This happens because we don't set avroSchema_ in HdfsTable during
> during metadata load -> during the alteration operation
Done


http://gerrit.cloudera.org:8080/#/c/3045/1/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

Line 63:     return Status("Missing avro schema in scan node");
> Let's include the possible workaround. something like:
Done


Line 739:   if (record.schema == NULL) return Status("Missing avro schema in scan node");
> same comment regarding mentioning workarounds
Done


http://gerrit.cloudera.org:8080/#/c/3045/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

Line 1007:       if (loadTableSchema) loadSchema(client, msTbl);
> call setAvroSchema() in the loadTableSchema case
Done


Line 1012:         setAvroSchema(client, msTbl);
> why do we need this?
Removed.


Line 1261:    * Sets avroSchema_ if the table or any of the partitions in the table are backed
> are stored as Avro.
Done


Line 1262:    * by AVRO file format. Additionally this method also reconciles the schema if
> missing comma after Additionally
Done


Line 1263:    * the column definitions from the metastore differ from the actual schema.
> 'actual schema' is vague, say 'differ from the Avro schema'
Done


Line 1267:     String inputFormat = msTbl.getSd().getInputFormat();
> Add Preconditions check that asserts loadSchema() has already been called.
Done


Line 1268:     if (HdfsFileFormat.fromJavaClassName(inputFormat) == HdfsFileFormat.AVRO ||
> Is this first check still needed? An unpartitioned table should have a sing
I didn't remove it so that I can short-circuit hasParitionOfFileFormat() call, but looks like
an unnecessary optimization. Removed it.


Line 1270:       // Look for avro schema in TBLPROPERTIES and in SERDEPROPERTIES, with the
latter
> Use "Avro" everywhere consistently
Done


Line 1277:       // Set the avroSchema_ if it is found in any of the search locations. This
is
> Comment could be misleading because it's not just "useful" in the described
Done


Line 1291:       // At this point, nonPartFieldSchemas_ already has the field schemas from
HMS.
> Confusing comment because this is a precondition of this function, and not 
Given we are adding the Precond check above, removing this.


Line 1330:    * Loads table schema from Hive Metastore. It also loads column stats.
> Loads the table schema and column stats from the Hive Metastore.
Done


http://gerrit.cloudera.org:8080/#/c/3045/1/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

Line 1902: functional
> Let's try not to add one-off-tables for bugfixes in the schema template. Im
Done


http://gerrit.cloudera.org:8080/#/c/3045/1/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
File testdata/workloads/functional-query/queries/QueryTest/mixed-format.test:

Line 28: # IMPALA-3314: Run a query on multi-file-format table with an avro partition
> This test feels more like it should go into TestAvroSchemaResolution.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message