impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Vissapragada (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
Date Thu, 19 May 2016 10:26:47 GMT
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables

Patch Set 1:

Commit Message:

Line 7: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
> I think a more appropriate title would be something lile:

Line 9: Bug: Impalads crash if we query a partitioned table with multiple file
> Description seems too complicated. I think Impalads will crash when queryin

Line 12: Cause: This happens because we don't set avroSchema_ in HdfsTable during
> during metadata load -> during the alteration operation
File be/src/exec/

Line 63:     return Status("Missing avro schema in scan node");
> Let's include the possible workaround. something like:

Line 739:   if (record.schema == NULL) return Status("Missing avro schema in scan node");
> same comment regarding mentioning workarounds
File fe/src/main/java/com/cloudera/impala/catalog/

Line 1007:       if (loadTableSchema) loadSchema(client, msTbl);
> call setAvroSchema() in the loadTableSchema case

Line 1012:         setAvroSchema(client, msTbl);
> why do we need this?

Line 1261:    * Sets avroSchema_ if the table or any of the partitions in the table are backed
> are stored as Avro.

Line 1262:    * by AVRO file format. Additionally this method also reconciles the schema if
> missing comma after Additionally

Line 1263:    * the column definitions from the metastore differ from the actual schema.
> 'actual schema' is vague, say 'differ from the Avro schema'

Line 1267:     String inputFormat = msTbl.getSd().getInputFormat();
> Add Preconditions check that asserts loadSchema() has already been called.

Line 1268:     if (HdfsFileFormat.fromJavaClassName(inputFormat) == HdfsFileFormat.AVRO ||
> Is this first check still needed? An unpartitioned table should have a sing
I didn't remove it so that I can short-circuit hasParitionOfFileFormat() call, but looks like
an unnecessary optimization. Removed it.

Line 1270:       // Look for avro schema in TBLPROPERTIES and in SERDEPROPERTIES, with the
> Use "Avro" everywhere consistently

Line 1277:       // Set the avroSchema_ if it is found in any of the search locations. This
> Comment could be misleading because it's not just "useful" in the described

Line 1291:       // At this point, nonPartFieldSchemas_ already has the field schemas from
> Confusing comment because this is a precondition of this function, and not 
Given we are adding the Precond check above, removing this.

Line 1330:    * Loads table schema from Hive Metastore. It also loads column stats.
> Loads the table schema and column stats from the Hive Metastore.
File testdata/datasets/functional/functional_schema_template.sql:

Line 1902: functional
> Let's try not to add one-off-tables for bugfixes in the schema template. Im
File testdata/workloads/functional-query/queries/QueryTest/mixed-format.test:

Line 28: # IMPALA-3314: Run a query on multi-file-format table with an avro partition
> This test feels more like it should go into TestAvroSchemaResolution.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-HasComments: Yes

View raw message