Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type;
  b=mG9Ozi04bTZWrm6NU4WY5Pezn4NpK02wZ+jNWtLoRK0DiQ/Tevjy9qquawrcpZu55Vx7Rt20emaz/qZlb9mTmIA+/MOFw/lzJSgmSmrJckMnZkLaaW87buKroKY+rp3N7ce+ax8H1sxKOxik7WT1xh8vE10l3bMnP1mrFZdE1Nw=;
References: <1323951222.19213.YahooMailNeo@web121203.mail.ne1.yahoo.com>
 <D67BECDE6F3D764AB087905A7E421B4F36E1FC8B@PRODEXMB04W.eagle.usaa.com>
Message-ID: <1324032109.60466.YahooMailNeo@web121205.mail.ne1.yahoo.com>
Date: Fri, 16 Dec 2011 02:41:49 -0800 (PST)
From: Bejoy Ks <bejoy_ks@yahoo.com>
Reply-To: Bejoy Ks <bejoy_ks@yahoo.com>
Subject: Re: bucketing in hive
To: "user@hive.apache.org" <user@hive.apache.org>
In-Reply-To: 
 <D67BECDE6F3D764AB087905A7E421B4F36E1FC8B@PRODEXMB04W.eagle.usaa.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="1078491548-1746711715-1324032109=:60466"

--1078491548-1746711715-1324032109=:60466
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Ranjith=0A=C2=A0=C2=A0=C2=A0 You can definitely change the number of bucket=
s in a hive table even after its creation. You need to issue an alter table=
 command that contains the CLUSTERED BY and/or SORTED BY clauses used by yo=
ur table. For example if I have a table whose DDL looks like this=0A=0ACREA=
TE EXTERNAL TABLE employee=0A(=0A=C2=A0 emp_id STRING, emp_name STRING,=0A=
=C2=A0 dept STRING, location STRING,=0A)=0ACLUSTERED BY(dept,location) SORT=
ED BY(dept,location) INTO 15 BUCKETS ;=0A=0AYou can ALTER the number of BUC=
KETS using the ALTER TABLE command as=0A=0A=0AALTER TABLE employee CLUSTERE=
D BY(dept,location) SORTED BY(dept,location) INTO 20 BUCKETS ;=0A=0A=0AThe =
one major factor you need to consider here is that if you are using samplin=
g queries on a partitioned - bucketed tables, you need to keep in mind that=
 the older partitions may have different number of buckets where as the new=
 partitions after the ALTER statement would have a different number of buck=
ets.=0A=0AHope it helps!...=0A=0ARegards=0ABejoy.K.S=0A=0A=0A=0A___________=
_____________________=0A From: "Raghunath, Ranjith" <Ranjith.Raghunath1@usa=
a.com>=0ATo: "'user@hive.apache.org'" <user@hive.apache.org>; "'bejoy_ks@ya=
hoo.com'" <bejoy_ks@yahoo.com> =0ASent: Friday, December 16, 2011 10:48 AM=
=0ASubject: Re: bucketing in hive=0A =0A=0AThanks Bejoy. Appreciate the ins=
ight. =0A=0ADo you know of altering the number of buckets once a table has =
been set up? =0A=0AThanks, =0ARanjith =0A=C2=A0=0A=0AFrom: Bejoy Ks [mailto=
:bejoy_ks@yahoo.com] =0ASent: Thursday, December 15, 2011 06:13 AM=0ATo: us=
er@hive.apache.org <user@hive.apache.org>; hive dev list <dev@hive.apache.o=
rg> =0ASubject: Re: bucketing in hive =0A=C2=A0=0A=0AHi Ranjith=0A=C2=A0=C2=
=A0=C2=A0 I'm not aware of any Dynamic Bucketing in hive where as there is =
definitely=C2=A0 Dynamic Partitions available. Your partitions/sub partitio=
ns would be generated on the fly/dynamically based on the value of a partic=
ular column .The records with same values for that column would go into the=
 same partition. But=C2=A0 Dynamic Partition load can't happen with a LOAD =
DATA statement as it requires running mapreduce job, You can utilize dynami=
c partitions in 2 steps for delimited files=0A- Load delimited file into a =
non partitioned table in hive using LOAD DATA=0A=0A- Load data into destina=
tion table from the source table using INSERT OVERWRITE - here a MR job wou=
ld be triggered that would do the job for you.=0A=0AI have scribbled someth=
ing down on the same, check whether it'd be useful for you.=0Ahttp://kickst=
arthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html=0A=
=0ARegards=0ABejoy.K.S=0A=0A=0A=0A________________________________=0A From:=
 "Raghunath, Ranjith" <Ranjith.Raghunath1@usaa.com>=0ATo: "user@hive.apache=
.org" <user@hive.apache.org>; hive dev list <dev@hive.apache.org> =0ASent: =
Thursday, December 15, 2011 7:53 AM=0ASubject: bucketing in hive=0A=0A=0A =
=0ACan one use bucketing in hive to emulate hash partitions on a database? =
Is there also a way to segment data into buckets dynamically based on value=
s in the column. For example, =0A=C2=A0=0ACol1 =C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0 Col2=0AApple=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=0AO=
range =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0 2=0AApple =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2=0ABana=
na=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0 1=0A=C2=A0=0AIf the file above were inserted into =
a table with Col1 as the bucket column, can we dynamically allow all of the=
 rows with =E2=80=9CApple=E2=80=9D in one file and =E2=80=9COrange=E2=80=9D=
 in one file and so on. Is there a way to do this without specifying the bu=
cket size to be 3.  =0AThank you, =0ARanjith 
--1078491548-1746711715-1324032109=:60466
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><body><div style=3D"color:#000; background-color:#fff; font-family:ve=
rdana, helvetica, sans-serif;font-size:10pt"><div><span>Ranjith</span></div=
><div><span class=3D"tab">&nbsp;&nbsp;&nbsp; You can definitely change the =
number of buckets in a hive table even after its creation. You need to issu=
e an alter table command that contains the CLUSTERED BY and/or SORTED BY cl=
auses used by your table. For example if I have a table whose DDL looks lik=
e this</span></div><div><br><span class=3D"tab"></span></div><div>CREATE EX=
TERNAL TABLE employee<br>(<br>&nbsp; emp_id STRING, emp_name STRING,<br>&nb=
sp; dept STRING, location STRING,<br>)<br>CLUSTERED BY(dept,location) SORTE=
D BY(dept,location) INTO 15 BUCKETS ;</div><div><br><span class=3D"tab"></s=
pan></div><div><span class=3D"tab">You can ALTER the number of BUCKETS usin=
g the ALTER TABLE command as</span></div><div><br></div><div><br>ALTER TABL=
E employee CLUSTERED BY(dept,location) SORTED BY(dept,location) INTO 20
 BUCKETS ;<br><span class=3D"tab"></span></div><div><br><span class=3D"tab"=
></span></div><div><span class=3D"tab">The one major factor you need to con=
sider here is that if you are using sampling queries on a partitioned - buc=
keted tables, you need to keep in mind that the older partitions may have d=
ifferent number of buckets where as the new partitions after the ALTER stat=
ement would have a different number of buckets.</span></div><div><br><span =
class=3D"tab"></span></div><div><span class=3D"tab">Hope it helps!...</span=
></div><div><br><span class=3D"tab"></span></div><div><span class=3D"tab">R=
egards</span></div><div><span class=3D"tab">Bejoy.K.S<br></span></div><div>=
<br></div>  <div style=3D"font-family: verdana, helvetica, sans-serif; font=
-size: 10pt;"> <div style=3D"font-family: times new roman, new york, times,=
 serif; font-size: 12pt;"> <font face=3D"Arial" size=3D"2"> <hr size=3D"1">=
  <b><span style=3D"font-weight:bold;">From:</span></b> "Raghunath, Ranjith=
"
 &lt;Ranjith.Raghunath1@usaa.com&gt;<br> <b><span style=3D"font-weight: bol=
d;">To:</span></b> "'user@hive.apache.org'" &lt;user@hive.apache.org&gt;; "=
'bejoy_ks@yahoo.com'" &lt;bejoy_ks@yahoo.com&gt; <br> <b><span style=3D"fon=
t-weight: bold;">Sent:</span></b> Friday, December 16, 2011 10:48 AM<br> <b=
><span style=3D"font-weight: bold;">Subject:</span></b> Re: bucketing in hi=
ve<br> </font> <br>=0A<meta http-equiv=3D"x-dns-prefetch-control" content=
=3D"off"><div id=3D"yiv1722021752">=0A=0A =0A=0A<div>=0A<font style=3D"font=
-size:11.0pt;color:#1F497D;">Thanks Bejoy. Appreciate the insight.=0A<br>=
=0A<br>=0ADo you know of altering the number of buckets once a table has be=
en set up? <br>=0A<br>=0AThanks, <br>=0ARanjith </font><br>=0A&nbsp;<br>=0A=
<div style=3D"border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in =
0in 0in;">=0A<font style=3D"font-size:10.0pt;"><b>From</b>: Bejoy Ks [mailt=
o:bejoy_ks@yahoo.com]=0A<br>=0A<b>Sent</b>: Thursday, December 15, 2011 06:=
13 AM<br>=0A<b>To</b>: user@hive.apache.org &lt;user@hive.apache.org&gt;; h=
ive dev list &lt;dev@hive.apache.org&gt;=0A<br>=0A<b>Subject</b>: Re: bucke=
ting in hive <br>=0A</font>&nbsp;<br>=0A</div>=0A<div style=3D"color:#000;b=
ackground-color:#fff;font-family:times new roman, new york, times, serif;fo=
nt-size:12pt;">=0A<div><span>Hi Ranjith</span></div>=0A<div><span class=3D"=
yiv1722021752tab">&nbsp;&nbsp;&nbsp; I'm not aware of any Dynamic Bucketing=
 in hive where as there is definitely&nbsp; Dynamic Partitions available. Y=
our partitions/sub partitions would be generated on the fly/dynamically bas=
ed on the value of a particular column .The records=0A with same values for=
 that column would go into the same partition. But&nbsp; Dynamic Partition =
load can't happen with a LOAD DATA statement as it requires running mapredu=
ce job, You can utilize dynamic partitions in 2 steps for delimited files</=
span></div>=0A<div><span class=3D"yiv1722021752tab">- Load delimited file i=
nto a non partitioned table in hive using LOAD DATA<br>=0A</span></div>=0A<=
div><span class=3D"yiv1722021752tab">- Load data into destination table fro=
m the source table using INSERT OVERWRITE - here a MR job would be triggere=
d that would do the job for you.</span></div>=0A<div><br>=0A</div>=0A<div>I=
 have scribbled something down on the same, check whether it'd be useful fo=
r you.</div>=0A<div>http://kickstarthadoop.blogspot.com/2011/06/how-to-spee=
d-up-your-hive-queries-in.html</div>=0A<div><br>=0A</div>=0A<div>Regards</d=
iv>=0A<div>Bejoy.K.S<br>=0A<span class=3D"yiv1722021752tab"></span></div>=
=0A<div><br>=0A</div>=0A<div style=3D"font-family:times new roman, new york=
, times, serif;font-size:12pt;">=0A<div style=3D"font-family:times new roma=
n, new york, times, serif;font-size:12pt;">=0A<font face=3D"Arial" size=3D"=
2">=0A<hr size=3D"1">=0A<b><span style=3D"font-weight:bold;">From:</span></=
b> "Raghunath, Ranjith" &lt;Ranjith.Raghunath1@usaa.com&gt;<br>=0A<b><span =
style=3D"font-weight:bold;">To:</span></b> "user@hive.apache.org" &lt;user@=
hive.apache.org&gt;; hive dev list &lt;dev@hive.apache.org&gt;=0A<br>=0A<b>=
<span style=3D"font-weight:bold;">Sent:</span></b> Thursday, December 15, 2=
011 7:53 AM<br>=0A<b><span style=3D"font-weight:bold;">Subject:</span></b> =
bucketing in hive<br>=0A</font><br>=0A =0A<div id=3D"yiv1722021752"><style>=
<!--=0A#yiv1722021752   =0A filtered  {font-family:Calibri;panose-1:2 15 5 =
2 2 2 4 3 2 4;}=0A#yiv1722021752   =0A p.yiv1722021752MsoNormal, #yiv172202=
1752  li.yiv1722021752MsoNormal, #yiv1722021752  div.yiv1722021752MsoNormal=
=0A=09{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:"serif=
";}=0A#yiv1722021752  a:link, #yiv1722021752  span.yiv1722021752MsoHyperlin=
k=0A=09{=0Acolor:blue;text-decoration:underline;}=0A#yiv1722021752  a:visit=
ed, #yiv1722021752  span.yiv1722021752MsoHyperlinkFollowed=0A=09{=0Acolor:p=
urple;text-decoration:underline;}=0A#yiv1722021752  span.yiv1722021752Email=
Style17=0A=09{=0Afont-family:"sans-serif";color:#1F497D;}=0A#yiv1722021752 =
 .yiv1722021752MsoChpDefault=0A=09{}=0A#yiv1722021752 filtered  {=0Amargin:=
1.0in 1.0in 1.0in 1.0in;}=0A#yiv1722021752  div.yiv1722021752WordSection1=
=0A=09{}=0A--></style>=0A<div>=0A<div class=3D"yiv1722021752WordSection1">=
=0A<div class=3D"yiv1722021752MsoNormal"><span style=3D"font-size:11.0pt;co=
lor:#1F497D;">Can one use bucketing in hive to emulate hash partitions on a=
 database? Is there also a way to segment data into buckets dynamically bas=
ed on values in the column. For example,=0A</span></div>=0A<div class=3D"yi=
v1722021752MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D;">&nbsp=
;</span></div>=0A<div class=3D"yiv1722021752MsoNormal"><span style=3D"font-=
size:11.0pt;color:#1F497D;">Col1 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp; Col2</span></div>=0A<div class=3D"yiv1722021752MsoNormal"><span style=
=3D"font-size:11.0pt;color:#1F497D;">Apple&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
; 1</span></div>=0A<div class=3D"yiv1722021752MsoNormal"><span style=3D"fon=
t-size:11.0pt;color:#1F497D;">Orange &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2</span></div>=0A<div =
class=3D"yiv1722021752MsoNormal"><span style=3D"font-size:11.0pt;color:#1F4=
97D;">Apple &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2</span></div>=0A<div class=
=3D"yiv1722021752MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D;"=
>Banana&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp; 1</span></div>=0A<div class=3D"yiv1722021752Ms=
oNormal"><span style=3D"font-size:11.0pt;color:#1F497D;">&nbsp;</span></div=
>=0A<div class=3D"yiv1722021752MsoNormal"><span style=3D"font-size:11.0pt;c=
olor:#1F497D;">If the file above were inserted into a table with Col1 as th=
e bucket column, can we dynamically allow all of the rows with =E2=80=9CApp=
le=E2=80=9D in one file and =E2=80=9COrange=E2=80=9D in one file and so=0A =
on. Is there a way to do this without specifying the bucket size to be 3. <=
/span>=0A</div>=0A<div class=3D"yiv1722021752MsoNormal" style=3D""><span st=
yle=3D"color:#1F497D;">Thank you,=0A</span></div>=0A<div class=3D"yiv172202=
1752MsoNormal"><span style=3D"color:#1F497D;">Ranjith </span></div>=0A</div=
>=0A</div>=0A</div>=0A =0A<br>=0A<br>=0A</div>=0A</div>=0A</div>=0A</div>=
=0A=0A</div><meta http-equiv=3D"x-dns-prefetch-control" content=3D"on"><br>=
<br> </div> </div>  </div></body></html>
--1078491548-1746711715-1324032109=:60466--