hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumar V <kumarbuyonl...@yahoo.com>
Subject Re: UPDATE : Adding new columns to parquet based Hive table
Date Thu, 29 Jan 2015 15:39:33 GMT
I wanted to clarify something.  It works if the Hive-Parquet table is a plain vanilla table.
 But if the table is a partitioned table, then the error occurs after adding new fields to
the table.   Any ideas on how to handle this ?
hive> create table nvctest_part(col1 string,col2 string, col3 int) partitioned by (partcol
string)    >  ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'    > STORED
AS INPUTFORMAT  'parquet.hive.DeprecatedParquetInputFormat'    > OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
hive> insert into table  parquet_part partition (partcol)  select code, description,
salary,'1' from sample_08 limit 4;

hive> select col1 from parquet_part where partcol='1';
00-000011-000011-101111-1021

hive> alter table parquet_part add columns (NewField1 string,Newfield2 string,newfield3
string);OKTime taken: 0.104 secondshive> desc parquet_part    > ;OKcol1    string
 from deserializercol2    string  from deserializercol3    int     from deserializernewfield1
      string  from deserializernewfield2       string  from deserializernewfield3
      string  from deserializerpartcol stringTime taken: 0.123 seconds
hive> select col1 from parquet_part where partcol='1';

Task with the most failures(4):-----Task ID:  task_201411191237_9181_m_000000
URL:  http://hadoop3-mgt.hdp.us.grid.nuance.com:50030/taskdetails.jsp?jobid=job_201411191237_9181&tipid=task_201411191237_9181_m_000000-----Diagnostic
Messages for this Task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row [Error getting row data with exception java.lang.UnsupportedOperationException:
Cannot inspect java.util.ArrayList        at parquet.hive.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:133) 
      at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128) 
      at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354) 
      at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:220) 
      at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:669)   
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)        at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) 
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)        at org.apache.hadoop.mapred.Child$4.run(Child.ja
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTaskMapReduce
Jobs Launched:Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAILTotal MapReduce CPU Time Spent:
0 msec

 

     On Wednesday, January 14, 2015 4:20 PM, Kumar V <kumarbuyonline@yahoo.com> wrote:
   

 Hi,    Thanks for your response.I can't do another insert as the data is already in the
table. Also, since there is a lot of data in the table already, I am trying to find a way
to avoid reprocessing/reloading.
Thanks. 

     On Wednesday, January 14, 2015 2:47 PM, Daniel Haviv <daniel.haviv@veracity-group.com>
wrote:
   

 Hi Kumar,Altering the table just update's Hive's metadata without updating parquet's schema.I
believe that if you'll insert to your table (after adding the column) you'll be able to later
on select all 3 columns.
Daniel
On 14 בינו׳ 2015, at 21:34, Kumar V <kumarbuyonline@yahoo.com> wrote:


Hi,
    Any ideas on how to go about this ? Any insights you have would be helpful. I am kinda
stuck here.
Here are the steps I followed on hive 0.13
1) create table t (f1 String, f2 string) stored as Parquet;2) upload parquet files with 2
fields3) select * from t; <---- Works fine.4) alter table t add columns (f3 string);5)
Select * from t; <----- ERROR  "Caused by: java.lang.IllegalStateException: Column f3
at index 2 does not exist at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:116) 
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204) 
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79) 
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66) 
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) 
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)


 

     On Wednesday, January 7, 2015 2:55 PM, Kumar V <kumarbuyonline@yahoo.com> wrote:
   

 Hi,    I have a Parquet format Hive table with a few columns.  I have loaded a lot of
data to this table already and it seems to work.I have to add a few new columns to this table.
 If I add new columns, queries don't work anymore since I have not reloaded the old data.Is
there a way to add new fields to the table and not reload the old Parquet files and make the
query work ?
I tried this in Hive 0.10 and also on hive 0.13.  Getting an error in both versions.
Please let me know how to handle this.
Regards,Kumar. 

    


    

   
Mime
View raw message