hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Barna Zsombor Klara (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
Date Fri, 28 Apr 2017 10:07:04 GMT
Barna Zsombor Klara created HIVE-16559:
------------------------------------------

             Summary: Parquet schema evolution for partitioned tables may break if table and
partition serdes differ
                 Key: HIVE-16559
                 URL: https://issues.apache.org/jira/browse/HIVE-16559
             Project: Hive
          Issue Type: Bug
            Reporter: Barna Zsombor Klara
            Assignee: Barna Zsombor Klara


Parquet schema evolution should make it possible to have partitions/tables 
 backed by files with different schemas. Hive should match the table columns with file columns
based on the column name if possible.
However if the serde for a table is missing columns from the serde of a partition Hive fails
to match the columns together.
Steps to reproduce:
{code}
CREATE TABLE myparquettable_parted
(
  name string,
  favnumber int,
  favcolor string,
  age int,
  favpet string
)
PARTITIONED BY (day string)
STORED AS PARQUET;

INSERT OVERWRITE TABLE myparquettable_parted
PARTITION(day='2017-04-04')
SELECT
   'mary' as name,
   5 AS favnumber,
   'blue' AS favcolor,
   35 AS age,
   'dog' AS favpet;

REPLACE COLUMNS
(
favnumber int,
age int
);   <!--- No cascade option, so the partition will not be altered. 
{code}
{{SELECT * FROM myparquettable_parted where day='2017-04-04';}}
will fail with:
{{java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable}}

Hive should either match the columns together or prevent the user from dropping columns from
the table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message