hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho" <>
Subject Re: Review Request 18254: HIVE-6375 Implement CTAS and column rename for parquet
Date Fri, 21 Feb 2014 01:31:45 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Feb. 21, 2014, 1:31 a.m.)

Review request for hive.


Fix test.  Did not fix the output completely deterministically in my previous attempt.   Needed
to sort by both key,value for srcbucket to get deterministic result.

Decided to use table 'src' instead which has unique key,value pairs.

Bugs: HIVE-6375

Repository: hive-git


There is a Hive bug in SemanticAnalyzer that chooses different names for columns in the CreateTable
task and the FileSink task.  columnInfo.getInternalName() was used in one place, and fieldSchema
still used columnInfo.getAlias() if it is available.  This change makes both consistent, favoring
columnInfo.getAlias if it is available.

This is not revealed before because other file-formats like RcFile seem to use column-ordinal
position, and Avro file stores the schema separately altogether.

Diffs (updated)

  ql/src/java/org/apache/hadoop/hive/ql/parse/ a01aa0e 
  ql/src/test/queries/clientpositive/parquet_ctas.q PRE-CREATION 
  ql/src/test/results/clientpositive/ctas.q.out 9668855 
  ql/src/test/results/clientpositive/ctas_hadoop20.q.out 2c0059d 
  ql/src/test/results/clientpositive/merge3.q.out ae7dc71 
  ql/src/test/results/clientpositive/parquet_ctas.q.out PRE-CREATION 



Added parquet_ctas.q.  Covers cases where column name is gotten directly from input table
(implied alias), where name is auto-generated, where name is specified as alias, and a mix
of the three.


Szehon Ho

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message