Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 83DF317E87 for ; Wed, 14 Jan 2015 21:20:49 +0000 (UTC) Received: (qmail 27029 invoked by uid 500); 14 Jan 2015 21:20:49 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 26962 invoked by uid 500); 14 Jan 2015 21:20:49 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 26951 invoked by uid 99); 14 Jan 2015 21:20:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jan 2015 21:20:49 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FORGED_YAHOO_RCVD,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kumarbuyonline@yahoo.com designates 98.138.91.118 as permitted sender) Received: from [98.138.91.118] (HELO nm25-vm6.bullet.mail.ne1.yahoo.com) (98.138.91.118) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jan 2015 21:20:44 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1421270360; bh=pEKiIpYYTCAGMsVOLqBBdiBVcWx83oqLsyfvKDKuYWk=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=Rri9OSAE7+TVMm3b+lbt+KUUD7984wKOIXEaMSd+UJ9yoaEN0H+W1rXKwWSj5Jj/kULM1T1Nh9e57iKRwdx8ndThCph5knCZGjOnZrcnvAVdfdwe9V8ewrVdtPhQGBoTzi1FaaLpUni4liyeWWgkRBEZ4IKdOkUqvXKTCzzbPmeFtYeG45CU8K0sk5JsidNqFz9Iky05z+P0TxYBAPToqUANa1czacyOw3seoAM93ti1q7LUOZcNOQp1tZy+Lp745N/7tQxt2sWfNt76m2JK5hTxEPLBi/oSKhSsad0mLL0hj1RT3o0SNwJw0kq+1x04kMvbHbSrfZB0JICps1O2Dg== Received: from [98.138.101.130] by nm25.bullet.mail.ne1.yahoo.com with NNFMP; 14 Jan 2015 21:19:20 -0000 Received: from [98.138.87.11] by tm18.bullet.mail.ne1.yahoo.com with NNFMP; 14 Jan 2015 21:19:20 -0000 Received: from [127.0.0.1] by omp1011.mail.ne1.yahoo.com with NNFMP; 14 Jan 2015 21:19:20 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 338461.54883.bm@omp1011.mail.ne1.yahoo.com X-YMail-OSG: 8GHISsAVM1nKmWh42pegccaxolLKVB3Wgddcjx148y0bYloQU8W3AWiAK8wJbf3 dPyBoG5QdR_gE3k1q0VPtKGx7kXCUoovUVf8EWHIVGJ4GoUQippCqBp5Hg.PzsX32vyk9JwmGTCs MLT6NOFrNtF0KiBonYNRQWiXmUR5aHGc6KGDtQ4yuONqnfY2OOXMbCsE8f806sYZPCW5VN3bDy8z lIuortWHQCSbHLTTXfgprJcIKvN9fDm0Y2HyQjnG9iEs4rxR0hd8PVJ7uSe6n4Y4hK0xSGETq2XZ zODQ8OeFSWR3ZBKT9urs1tEvvwKE5oTun8p.6hvg6as5geE0hPalQm0KzREVKsvUkchYzIZInM3L SnwWFZvYl2Np0XcTrcnK1ZZlk0emSWS_ERpBA5cTdQDGEfsIGbA.HnXqZvFKlJ5jQj24rbYQIKCm j9TCGnA4h8tVhEJDD63uYIYFrJgd4g8Fae6o4tmXOCpyym6X1PyL8.FG9U72H5LjjKb7LyC14RvC gp25MYMz_M4mawJrC7TMqXcJylA-- Received: by 98.138.105.212; Wed, 14 Jan 2015 21:19:19 +0000 Date: Wed, 14 Jan 2015 21:19:19 +0000 (UTC) From: Kumar V Reply-To: Kumar V To: "user@hive.apache.org" Message-ID: <2075460427.747829.1421270359476.JavaMail.yahoo@jws100204.mail.ne1.yahoo.com> In-Reply-To: References: Subject: Re: Adding new columns to parquet based Hive table MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_747828_174324465.1421270359468" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_747828_174324465.1421270359468 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi,=C2=A0 =C2=A0 Thanks for your response.I can't do another insert as the = data is already in the table. Also, since there is a lot of data in the tab= le already, I am trying to find a way to avoid reprocessing/reloading. Thanks.=20 On Wednesday, January 14, 2015 2:47 PM, Daniel Haviv wrote: =20 Hi Kumar,Altering the table just update's Hive's metadata without updating= parquet's schema.I believe that if you'll insert to your table (after addi= ng the column) you'll be able to later on select all 3 columns. Daniel On 14 =D7=91=D7=99=D7=A0=D7=95=D7=B3 2015, at 21:34, Kumar V wrote: Hi, =C2=A0 =C2=A0 Any ideas on how to go about this ? Any insights you have wou= ld be helpful. I am kinda stuck here. Here are the steps I followed on hive 0.13 1) create table t (f1 String, f2 string) stored as Parquet;2) upload parque= t files with 2 fields3) select * from t; <---- Works fine.4) alter table t = add columns (f3 string);5) Select * from t; <----- ERROR =C2=A0"Caused by: = java.lang.IllegalStateException: Column f3 at index 2 does not exist=C2=A0a= t org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(Da= taWritableReadSupport.java:116)=C2=A0 at org.apache.hadoop.hive.ql.io.parqu= et.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java= :204)=C2=A0 at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReade= rWrapper.(ParquetRecordReaderWrapper.java:79)=C2=A0 at org.apache.had= oop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecord= ReaderWrapper.java:66)=C2=A0 at org.apache.hadoop.hive.ql.io.parquet.Mapred= ParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)=C2=A0 = at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveR= ecordReader.java:65) =20 On Wednesday, January 7, 2015 2:55 PM, Kumar V wrote: =20 Hi,=C2=A0 =C2=A0 I have a Parquet format Hive table with a few columns. = =C2=A0I have loaded a lot of data to this table already and it seems to wor= k.I have to add a few new columns to this table. =C2=A0If I add new columns= , queries don't work anymore since I have not reloaded the old data.Is ther= e a way to add new fields to the table and not reload the old Parquet files= and make the query work ? I tried this in Hive 0.10 and also on hive 0.13. =C2=A0Getting an error in = both versions. Please let me know how to handle this. Regards,Kumar.=C2=A0 =20 ------=_Part_747828_174324465.1421270359468 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,=
 =   Thanks for your response.
I= can't do another insert as the data is already in the table. Also, since t= here is a lot of data in the table already, I am trying to find a way to av= oid reprocessing/reloading.

Thanks.


On Wednesday, January 14, 2015 2:47 PM, Daniel Haviv <daniel= .haviv@veracity-group.com> wrote:


Hi Kumar,
Altering the table just update's Hive's metadata without updat= ing parquet's schema.
I b= elieve that if you'll insert to your table (after adding the column) you'll= be able to later on select all 3 columns.

Daniel

On 14 =D7=91=D7=99=D7=A0=D7=95=D7=B3 201= 5, at 21:34, Kumar V <kumarbuyonline@yahoo.com> wrote:

Hi,

    Any ideas on how to go = about this ? Any insights you have would be helpful. I am kinda stuck here.=

Here are the steps I followed on hive 0.13

1) create table t (f1 String, f2 string) stored as Parquet;
2) upload parqu= et files with 2 fields
3) select * from t; <---- Works fine.
=
4) alt= er table t add columns (f3 string);
5) Select * from t; <----- ERROR &nbs= p;"Caused by: java.lang.IllegalStateException: Column f3 at index 2 does no= t exist 
at org.apache.hadoop.hive.q= l.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java= :116)
  at org.apache.hadoop.hive.ql= .io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWra= pper.java:204)
  at org.apache.hadoo= p.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRe= cordReaderWrapper.java:79)
  at org.= apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init&g= t;(ParquetRecordReaderWrapper.java:66)
&n= bsp; at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRe= cordReader(MapredParquetInputFormat.java:51)
  at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<in= it>(CombineHiveRecordReader.java:65)




On Wednesday, January 7, 2015 2:55 PM, Kumar V <kumarbuyonline@yahoo.com>= ; wrote:


Hi,
    I have a Parquet format Hive table with a few columns. =  I have loaded a lot of data to this table already and it seems to wor= k.
I have to add a few new columns to this table.  If I add new columns= , queries don't work anymore since I have not reloaded the old data.
<= div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1420658117185_5875">Is ther= e a way to add new fields to the table and not reload the old Parquet files= and make the query work ?

I tried this in Hive 0.10 and = also on hive 0.13.  Getting an error in both versions.

Please let me know how to handle this.

Regards,
Kumar= . 


=


=
------=_Part_747828_174324465.1421270359468--