impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
Date Tue, 17 May 2016 02:27:20 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
......................................................................


Patch Set 1:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/3045/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables
I think a more appropriate title would be something lile:

Fix querying tables/partitions altered to Avro format.


Line 9: Bug: Impalads crash if we query a partitioned table with multiple file
Description seems too complicated. I think Impalads will crash when querying a table or partition
that was altered to the Avro format.


Line 12: Cause: This happens because we don't set avroSchema_ in HdfsTable during
during metadata load -> during the alteration operation

please make other appropriate changes to this paragraph to reflect the more general problem


http://gerrit.cloudera.org:8080/#/c/3045/1/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

Line 63:     return Status("Missing avro schema in scan node");
Let's include the possible workaround. something like:

"This could indicate stale metadata if the table was recently altered. Running invalidate
metadata <table> on this table may resolve the problem."


Line 739:   if (record.schema == NULL) return Status("Missing avro schema in scan node");
same comment regarding mentioning workarounds


http://gerrit.cloudera.org:8080/#/c/3045/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

Line 1007:       if (loadTableSchema) loadSchema(client, msTbl);
call setAvroSchema() in the loadTableSchema case


Line 1012:         setAvroSchema(client, msTbl);
why do we need this?


Line 1261:    * Sets avroSchema_ if the table or any of the partitions in the table are backed
are stored as Avro.


Line 1262:    * by AVRO file format. Additionally this method also reconciles the schema if
missing comma after Additionally

... reconciles the schemas ...


Line 1263:    * the column definitions from the metastore differ from the actual schema.
'actual schema' is vague, say 'differ from the Avro schema'


Line 1267:     String inputFormat = msTbl.getSd().getInputFormat();
Add Preconditions check that asserts loadSchema() has already been called.


Line 1268:     if (HdfsFileFormat.fromJavaClassName(inputFormat) == HdfsFileFormat.AVRO ||
Is this first check still needed? An unpartitioned table should have a single partition.


Line 1270:       // Look for avro schema in TBLPROPERTIES and in SERDEPROPERTIES, with the
latter
Use "Avro" everywhere consistently


Line 1277:       // Set the avroSchema_ if it is found in any of the search locations. This
is
Comment could be misleading because it's not just "useful" in the described case. This step
is required, and regardless of partitioned/unpartitioned.


Line 1291:       // At this point, nonPartFieldSchemas_ already has the field schemas from
HMS.
Confusing comment because this is a precondition of this function, and not really applicable
here specifically.


Line 1330:    * Loads table schema from Hive Metastore. It also loads column stats.
Loads the table schema and column stats from the Hive Metastore.


http://gerrit.cloudera.org:8080/#/c/3045/1/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

Line 1902: functional
Let's try not to add one-off-tables for bugfixes in the schema template. Imo, this be a self-contained
pytest in TestAvroSchemaResolution.


http://gerrit.cloudera.org:8080/#/c/3045/1/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
File testdata/workloads/functional-query/queries/QueryTest/mixed-format.test:

Line 28: # IMPALA-3314: Run a query on multi-file-format table with an avro partition
This test feels more like it should go into TestAvroSchemaResolution.

Imo, the testing has two parts:
* test that Impala gets back the expected error message when altering to Avro without a metadata
reload
* when you do a metadata reload the table works just finr


-- 
To view, visit http://gerrit.cloudera.org:8080/3045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message