hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumar V <kumarbuyonl...@yahoo.com>
Subject Re: Adding new columns to parquet based Hive table
Date Wed, 14 Jan 2015 21:19:19 GMT
Hi,    Thanks for your response.I can't do another insert as the data is already in the table.
Also, since there is a lot of data in the table already, I am trying to find a way to avoid
reprocessing/reloading.
Thanks. 

     On Wednesday, January 14, 2015 2:47 PM, Daniel Haviv <daniel.haviv@veracity-group.com>
wrote:
   

 Hi Kumar,Altering the table just update's Hive's metadata without updating parquet's schema.I
believe that if you'll insert to your table (after adding the column) you'll be able to later
on select all 3 columns.
Daniel
On 14 בינו׳ 2015, at 21:34, Kumar V <kumarbuyonline@yahoo.com> wrote:


Hi,
    Any ideas on how to go about this ? Any insights you have would be helpful. I am kinda
stuck here.
Here are the steps I followed on hive 0.13
1) create table t (f1 String, f2 string) stored as Parquet;2) upload parquet files with 2
fields3) select * from t; <---- Works fine.4) alter table t add columns (f3 string);5)
Select * from t; <----- ERROR  "Caused by: java.lang.IllegalStateException: Column f3
at index 2 does not exist at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:116) 
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204) 
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79) 
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66) 
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) 
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)


 

     On Wednesday, January 7, 2015 2:55 PM, Kumar V <kumarbuyonline@yahoo.com> wrote:
   

 Hi,    I have a Parquet format Hive table with a few columns.  I have loaded a lot of
data to this table already and it seems to work.I have to add a few new columns to this table.
 If I add new columns, queries don't work anymore since I have not reloaded the old data.Is
there a way to add new fields to the table and not reload the old Parquet files and make the
query work ?
I tried this in Hive 0.10 and also on hive 0.13.  Getting an error in both versions.
Please let me know how to handle this.
Regards,Kumar. 

    


   
Mime
View raw message