impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command
Date Wed, 10 Jan 2018 20:18:06 GMT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8928 )

Change subject: IMPALA-4323: "SET ROW FORMAT" option added to "ALTER TABLE" command
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8928/3/fe/src/main/cup/sql-parser.cup@1007
PS3, Line 1007: // row_format_val cannot be used as it conflicts with cache_op_val.
> Hack won't work since the code block is after the parser branch and the par
I was not suggesting the hack as a solution. I was pointing out that the two rules conflict
because they both accept empty and that there is an existing hack in L1015 which works around
the problem that cache_op_val accepts empty.

I'm suggesting that we remove the hack in L1015 and instead have 4 new rules:
cache_op_val, opt_cache_op_val, row_format_val, opt_row_format_val

Those rules should be used appropriately.

I see you added a row_format_val_req. We typically do it the other way around and add an "opt"
prefix for productions that also accept empty. Unfortunately, we are inconsistent in some
places, but adding a "req" suffix is even more inconsistent :)

Let's try and clean this up


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java:

http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/analysis/AlterTableSetRowFormatStmt.java@72
PS4, Line 72:       affectedPartitions = ((HdfsTable) tbl).getPartitions();
> The current behavior of file format is to pick up the file format when the 
You are correct. The file format is set when the partition is created. This means that altering
the table-level row format should have no effect on existing partitions, so why forbid an
operation based on existing partitions if those partitions are not impacted at all?

Another point is that users can get into this state in various ways and disallowing one such
path does not make sense to me.

This is the behavior with your current code:

create table t (i int) partitioned by (p int) stored as textfile;
alter table t add partition (p=0);
alter table t partition (p=0) set fileformat parquet;
alter table t set row format delimited ...; <-- ERROR

But this works fine:
create table t (i int) partitioned by (p int) stored as textfile;
alter table t add partition (p=0);
alter table t set row format delimited ...;
alter table t partition (p=0) set fileformat parquet;

The table ends up in the same state, just the operations are executed in a different order.


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@460
PS4, Line 460:           reloadFileMetadata = alterTableSetRowFormat(tbl,
> This is mapped to "loadTableSchema" in the load method.  If I remove settin
This flag should be called reloadTableSchema. The current name is confusing at best and misleading
at worst.


http://gerrit.cloudera.org:8080/#/c/8928/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2267
PS4, Line 2267:    * reloaded on the next access.  The return value is true if the fileMetadata
needs
> It appears to be cached.  If not reloaded, alter statements do not take eff
Make sense. Sounds like the naming is simply wrong and we mean "table metadata" and not "file
metadata". I suggest renaming the flag and changing the wording to reflect that


http://gerrit.cloudera.org:8080/#/c/8928/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8928/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2227
PS5, Line 2227:    * operation, existing table data will not be converted to the new format.
After
I suggest rephrasing the last sentence to:

Returns true if the file metadata to be reloaded.

(The "marked as invalid" part in the existing wording is not correct and misleading)


http://gerrit.cloudera.org:8080/#/c/8928/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2267
PS5, Line 2267:    * reloaded on the next access.  The return value is true if the fileMetadata
needs
Returns true if the file metadata to be reloaded.


http://gerrit.cloudera.org:8080/#/c/8928/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2279
PS5, Line 2279:         org.apache.hadoop.hive.metastore.api.Table msTbl =
fix indentation, we indent 2


http://gerrit.cloudera.org:8080/#/c/8928/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/8928/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@478
PS5, Line 478:         "on TEXT or SEQUENCE file formats");
unsupportedFileFormatDbs


http://gerrit.cloudera.org:8080/#/c/8928/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@488
PS5, Line 488:   public void TestAlterTableSet() throws AnalysisException {
remove empty line


http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1348
PS4, Line 1348: alter table $DATABASE.delimited set row format delimited fields terminated
by ' ';
> Add a comment what we expect the row format to be after the alteration. We 
Done?


http://gerrit.cloudera.org:8080/#/c/8928/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test@1375
PS4, Line 1375: alter table $DATABASE.delimited partition (c0=1) set row format delimited
fields terminated by ' ';
> try altering the table-level row format and see how that affects the partit
Missed this one?



-- 
To view, visit http://gerrit.cloudera.org:8080/8928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I96e347463504915a6f33932552e4d1f61e9b1154
Gerrit-Change-Number: 8928
Gerrit-PatchSet: 4
Gerrit-Owner: Adam Holley <git@holleyism.com>
Gerrit-Reviewer: Adam Holley <git@holleyism.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Kim Jin Chul <jinchul@gmail.com>
Gerrit-Comment-Date: Wed, 10 Jan 2018 20:18:06 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message