Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (athena.apache.org: domain of kumarbuyonline@yahoo.com
 designates 98.138.91.118 as permitted sender)
Date: Wed, 14 Jan 2015 21:19:19 +0000 (UTC)
From: Kumar V <kumarbuyonline@yahoo.com>
Reply-To: Kumar V <kumarbuyonline@yahoo.com>
To: "user@hive.apache.org" <user@hive.apache.org>
Message-ID: 
 <2075460427.747829.1421270359476.JavaMail.yahoo@jws100204.mail.ne1.yahoo.com>
In-Reply-To: <ACB28FA1-260F-42EA-886B-E36EF8D1FDC3@veracity-group.com>
References: <ACB28FA1-260F-42EA-886B-E36EF8D1FDC3@veracity-group.com>
Subject: Re: Adding new columns to parquet based Hive table
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_747828_174324465.1421270359468"

------=_Part_747828_174324465.1421270359468
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi,=C2=A0 =C2=A0 Thanks for your response.I can't do another insert as the =
data is already in the table. Also, since there is a lot of data in the tab=
le already, I am trying to find a way to avoid reprocessing/reloading.
Thanks.=20

     On Wednesday, January 14, 2015 2:47 PM, Daniel Haviv <daniel.haviv@ver=
acity-group.com> wrote:
  =20

 Hi Kumar,Altering the table just update's Hive's metadata without updating=
 parquet's schema.I believe that if you'll insert to your table (after addi=
ng the column) you'll be able to later on select all 3 columns.
Daniel
On 14 =D7=91=D7=99=D7=A0=D7=95=D7=B3 2015, at 21:34, Kumar V <kumarbuyonlin=
e@yahoo.com> wrote:


Hi,
=C2=A0 =C2=A0 Any ideas on how to go about this ? Any insights you have wou=
ld be helpful. I am kinda stuck here.
Here are the steps I followed on hive 0.13
1) create table t (f1 String, f2 string) stored as Parquet;2) upload parque=
t files with 2 fields3) select * from t; <---- Works fine.4) alter table t =
add columns (f3 string);5) Select * from t; <----- ERROR =C2=A0"Caused by: =
java.lang.IllegalStateException: Column f3 at index 2 does not exist=C2=A0a=
t org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(Da=
taWritableReadSupport.java:116)=C2=A0 at org.apache.hadoop.hive.ql.io.parqu=
et.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java=
:204)=C2=A0 at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReade=
rWrapper.<init>(ParquetRecordReaderWrapper.java:79)=C2=A0 at org.apache.had=
oop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecord=
ReaderWrapper.java:66)=C2=A0 at org.apache.hadoop.hive.ql.io.parquet.Mapred=
ParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)=C2=A0 =
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveR=
ecordReader.java:65)


=20

     On Wednesday, January 7, 2015 2:55 PM, Kumar V <kumarbuyonline@yahoo.c=
om> wrote:
  =20

 Hi,=C2=A0 =C2=A0 I have a Parquet format Hive table with a few columns. =
=C2=A0I have loaded a lot of data to this table already and it seems to wor=
k.I have to add a few new columns to this table. =C2=A0If I add new columns=
, queries don't work anymore since I have not reloaded the old data.Is ther=
e a way to add new fields to the table and not reload the old Parquet files=
 and make the query work ?
I tried this in Hive 0.10 and also on hive 0.13. =C2=A0Getting an error in =
both versions.
Please let me know how to handle this.
Regards,Kumar.=C2=A0

   =20


------=_Part_747828_174324465.1421270359468
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html><body><div style=3D"color:#000; background-color:#fff; font-family:He=
lveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;fo=
nt-size:16px"><div id=3D"yui_3_16_0_1_1421269939603_16400"><span>Hi,</span>=
</div><div id=3D"yui_3_16_0_1_1421269939603_16400" dir=3D"ltr"><span>&nbsp;=
 &nbsp; Thanks for your response.</span></div><div id=3D"yui_3_16_0_1_14212=
69939603_16400" dir=3D"ltr"><span id=3D"yui_3_16_0_1_1421269939603_16745">I=
 can't do another insert as the data is already in the table. Also, since t=
here is a lot of data in the table already, I am trying to find a way to av=
oid reprocessing/reloading.</span></div><div id=3D"yui_3_16_0_1_14212699396=
03_16400" dir=3D"ltr"><span><br></span></div><div id=3D"yui_3_16_0_1_142126=
9939603_16400" dir=3D"ltr"><span>Thanks.</span></div> <div class=3D"qtdSepa=
rateBR" id=3D"yui_3_16_0_1_1421269939603_16746"><br><br></div><div class=3D=
"yahoo_quoted" style=3D"display: block;" id=3D"yui_3_16_0_1_1421269939603_1=
6439"> <div style=3D"font-family: HelveticaNeue, Helvetica Neue, Helvetica,=
 Arial, Lucida Grande, sans-serif; font-size: 16px;" id=3D"yui_3_16_0_1_142=
1269939603_16438"> <div style=3D"font-family: HelveticaNeue, Helvetica Neue=
, Helvetica, Arial, Lucida Grande, sans-serif; font-size: 16px;" id=3D"yui_=
3_16_0_1_1421269939603_16437"> <div dir=3D"ltr" id=3D"yui_3_16_0_1_14212699=
39603_16753"> <font size=3D"2" face=3D"Arial" id=3D"yui_3_16_0_1_1421269939=
603_16752"> On Wednesday, January 14, 2015 2:47 PM, Daniel Haviv &lt;daniel=
.haviv@veracity-group.com&gt; wrote:<br> </font> </div>  <br><br> <div clas=
s=3D"y_msg_container" id=3D"yui_3_16_0_1_1421269939603_16436"><div id=3D"yi=
v6874516281"><div id=3D"yui_3_16_0_1_1421269939603_16435"><div id=3D"yui_3_=
16_0_1_1421269939603_16443">Hi Kumar,</div><div id=3D"yui_3_16_0_1_14212699=
39603_16442">Altering the table just update's Hive's metadata without updat=
ing parquet's schema.</div><div id=3D"yui_3_16_0_1_1421269939603_16440">I b=
elieve that if you'll insert to your table (after adding the column) you'll=
 be able to later on select all 3 columns.</div><div id=3D"yui_3_16_0_1_142=
1269939603_16747"><br clear=3D"none"><div style=3D"direction:ltr;" id=3D"yu=
i_3_16_0_1_1421269939603_16748">Daniel</div></div><div class=3D"yiv68745162=
81yqt9225267112" id=3D"yiv6874516281yqt80661"><div id=3D"yui_3_16_0_1_14212=
69939603_16441"><br clear=3D"none">On 14 =D7=91=D7=99=D7=A0=D7=95=D7=B3 201=
5, at 21:34, Kumar V &lt;<a rel=3D"nofollow" shape=3D"rect" ymailto=3D"mail=
to:kumarbuyonline@yahoo.com" target=3D"_blank" href=3D"mailto:kumarbuyonlin=
e@yahoo.com">kumarbuyonline@yahoo.com</a>&gt; wrote:<br clear=3D"none"><br =
clear=3D"none"></div><blockquote type=3D"cite" id=3D"yui_3_16_0_1_142126993=
9603_16751"><div id=3D"yui_3_16_0_1_1421269939603_16750"><div style=3D"colo=
r:#000;background-color:#fff;font-family:HelveticaNeue, Helvetica Neue, Hel=
vetica, Arial, Lucida Grande, sans-serif;font-size:16px;" id=3D"yui_3_16_0_=
1_1421269939603_16749"><div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_142=
1263385292_6480"><span>Hi,</span></div><div dir=3D"ltr" id=3D"yiv6874516281=
yui_3_16_0_1_1421263385292_6481"><span><br clear=3D"none"></span></div><div=
 dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1421263385292_6482"><span id=
=3D"yui_3_16_0_1_1421269939603_16754">&nbsp; &nbsp; Any ideas on how to go =
about this ? Any insights you have would be helpful. I am kinda stuck here.=
</span></div><div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1421263385292=
_6483"><span><br clear=3D"none"></span></div><div dir=3D"ltr" id=3D"yiv6874=
516281yui_3_16_0_1_1421263385292_6484"><span id=3D"yiv6874516281yui_3_16_0_=
1_1421263385292_7923">Here are the steps I followed on hive 0.13</span></di=
v><div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1421263385292_6485"><spa=
n><br clear=3D"none"></span></div><div dir=3D"ltr" id=3D"yiv6874516281yui_3=
_16_0_1_1421263385292_6487"><span id=3D"yiv6874516281yui_3_16_0_1_142126338=
5292_6486">1) create table t (f1 String, f2 string) stored as Parquet;</spa=
n></div><div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1421263385292_6489=
"><span id=3D"yiv6874516281yui_3_16_0_1_1421263385292_6488">2) upload parqu=
et files with 2 fields</span></div><div dir=3D"ltr" id=3D"yiv6874516281yui_=
3_16_0_1_1421263385292_6492">3) select * from t; &lt;---- Works fine.</div>=
<div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1421263385292_6493">4) alt=
er table t add columns (f3 string);</div><div dir=3D"ltr" id=3D"yiv68745162=
81yui_3_16_0_1_1421263385292_6493">5) Select * from t; &lt;----- ERROR &nbs=
p;"Caused by: java.lang.IllegalStateException: Column f3 at index 2 does no=
t exist&nbsp;</div><div class=3D"yiv6874516281" dir=3D"ltr" id=3D"yiv687451=
6281yui_3_16_0_1_1421263385292_6493" style=3D"">at org.apache.hadoop.hive.q=
l.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java=
:116)</div><div class=3D"yiv6874516281" dir=3D"ltr" id=3D"yiv6874516281yui_=
3_16_0_1_1421263385292_6493" style=3D"">&nbsp; at org.apache.hadoop.hive.ql=
.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWra=
pper.java:204)</div><div class=3D"yiv6874516281" dir=3D"ltr" id=3D"yiv68745=
16281yui_3_16_0_1_1421263385292_6493" style=3D"">&nbsp; at org.apache.hadoo=
p.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.&lt;init&gt;(ParquetRe=
cordReaderWrapper.java:79)</div><div class=3D"yiv6874516281" dir=3D"ltr" id=
=3D"yiv6874516281yui_3_16_0_1_1421263385292_6493" style=3D"">&nbsp; at org.=
apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.&lt;init&g=
t;(ParquetRecordReaderWrapper.java:66)</div><div class=3D"yiv6874516281" di=
r=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1421263385292_6493" style=3D"">&n=
bsp; at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRe=
cordReader(MapredParquetInputFormat.java:51)</div><div class=3D"yiv68745162=
81" dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1421263385292_6493" style=
=3D"">&nbsp; at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.&lt;in=
it&gt;(CombineHiveRecordReader.java:65)</div><div class=3D"yiv6874516281" i=
d=3D"yiv6874516281yui_3_16_0_1_1421263385292_7938" style=3D""><br clear=3D"=
none" class=3D"yiv6874516281" style=3D""></div><div dir=3D"ltr" id=3D"yiv68=
74516281yui_3_16_0_1_1421263385292_6493"><br clear=3D"none"></div><div dir=
=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1421263385292_6493"><br clear=3D"n=
one"></div> <div class=3D"yiv6874516281qtdSeparateBR"><br clear=3D"none"><b=
r clear=3D"none"></div><div class=3D"yiv6874516281yahoo_quoted" style=3D"di=
splay: block;"> <div style=3D"font-family:HelveticaNeue, Helvetica Neue, He=
lvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div style=3D"f=
ont-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, =
sans-serif;font-size:16px;"> <div dir=3D"ltr"> <font size=3D"2" face=3D"Ari=
al"> On Wednesday, January 7, 2015 2:55 PM, Kumar V &lt;<a rel=3D"nofollow"=
 shape=3D"rect" ymailto=3D"mailto:kumarbuyonline@yahoo.com" target=3D"_blan=
k" href=3D"mailto:kumarbuyonline@yahoo.com">kumarbuyonline@yahoo.com</a>&gt=
; wrote:<br clear=3D"none"> </font> </div>  <br clear=3D"none"><br clear=3D=
"none"> <div class=3D"yiv6874516281y_msg_container"><div id=3D"yiv687451628=
1"><div><div style=3D"color:#000;background-color:#fff;font-family:Helvetic=
aNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-siz=
e:16px;"><div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1420658117185_587=
1">Hi,</div><div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1420658117185_=
5873">&nbsp; &nbsp; I have a Parquet format Hive table with a few columns. =
&nbsp;I have loaded a lot of data to this table already and it seems to wor=
k.</div><div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1420658117185_5875=
">I have to add a few new columns to this table. &nbsp;If I add new columns=
, queries don't work anymore since I have not reloaded the old data.</div><=
div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1420658117185_5875">Is ther=
e a way to add new fields to the table and not reload the old Parquet files=
 and make the query work ?</div><div dir=3D"ltr" id=3D"yiv6874516281yui_3_1=
6_0_1_1420658117185_5875"><br clear=3D"none"></div><div dir=3D"ltr" id=3D"y=
iv6874516281yui_3_16_0_1_1420658117185_5875">I tried this in Hive 0.10 and =
also on hive 0.13. &nbsp;Getting an error in both versions.</div><div dir=
=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1420658117185_5875"><br clear=3D"n=
one"></div><div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1420658117185_5=
875">Please let me know how to handle this.</div><div dir=3D"ltr" id=3D"yiv=
6874516281yui_3_16_0_1_1420658117185_5875"><br clear=3D"none"></div><div di=
r=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1420658117185_5875">Regards,</div=
><div dir=3D"ltr" id=3D"yiv6874516281yui_3_16_0_1_1420658117185_5875">Kumar=
.&nbsp;</div></div></div></div><br clear=3D"none"><br clear=3D"none"></div>=
  </div> </div>  </div> </div></div></blockquote></div></div></div><br><br>=
</div>  </div> </div>  </div> </div></body></html>
------=_Part_747828_174324465.1421270359468--