Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1BC5F9578 for ; Fri, 16 Dec 2011 10:42:24 +0000 (UTC) Received: (qmail 12455 invoked by uid 500); 16 Dec 2011 10:42:23 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 12427 invoked by uid 500); 16 Dec 2011 10:42:23 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 12416 invoked by uid 99); 16 Dec 2011 10:42:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Dec 2011 10:42:23 +0000 X-ASF-Spam-Status: No, hits=4.7 required=5.0 tests=FREEMAIL_FORGED_REPLYTO,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.91.209] (HELO nm21-vm2.bullet.mail.ne1.yahoo.com) (98.138.91.209) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 16 Dec 2011 10:42:10 +0000 Received: from [98.138.90.56] by nm21.bullet.mail.ne1.yahoo.com with NNFMP; 16 Dec 2011 10:41:49 -0000 Received: from [98.138.89.168] by tm9.bullet.mail.ne1.yahoo.com with NNFMP; 16 Dec 2011 10:41:49 -0000 Received: from [127.0.0.1] by omp1024.mail.ne1.yahoo.com with NNFMP; 16 Dec 2011 10:41:49 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 345895.48610.bm@omp1024.mail.ne1.yahoo.com Received: (qmail 62585 invoked by uid 60001); 16 Dec 2011 10:41:49 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1324032109; bh=QEeAm/nJWVubbUjXtKxZPtWIkcO2E3sIUlrds5wCXFc=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=e/lEcGnnjwe4rgfC3E5IqzN+hIVMP93KclSxEIxhXivD11l1Ie28xElMIEa/16PaHNPGqd7eLmufvfTO9CWhG+lzcB/UQ1sEY6QWJQUX8Og1weBYLH0GWj2d2ZNCtp8+k/cshkvjoZEpLa8W7ynva64CC03XpnYmEoyOnLzgg1I= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=mG9Ozi04bTZWrm6NU4WY5Pezn4NpK02wZ+jNWtLoRK0DiQ/Tevjy9qquawrcpZu55Vx7Rt20emaz/qZlb9mTmIA+/MOFw/lzJSgmSmrJckMnZkLaaW87buKroKY+rp3N7ce+ax8H1sxKOxik7WT1xh8vE10l3bMnP1mrFZdE1Nw=; X-YMail-OSG: E6HrJH8VM1krW7_nlKvAbXjsdPsoE25nSMPAGiL2YalbTDz bGgjWGnsdnubDMGyBdTfB_6q3z1Mi9MQCUy5Fo84FxgXAma_XdiZhL9kgyMM pzDO6no7Mp87KO93dl_rt.ZL6qE2Ac5Pi8MDpUX3r9bWbuzmHQCZstWa5.Ak 36ZVSlfty8sjUqZ2vBjewGN3LL03opPS1ZJYdaKTYl22z_iqLSQS92iEMrTL O0bYgWbJrHagfnNfCY6wJQJvJdDbgH_t15CI2i3iM.82bO9LkvVD3TS42_sY z_gjZffDqbNUVoFYU4Q1Ph9AUQV3BxKd1pXAA6IV8j0TPTyg_VGQELxESRZP e3iaZqrtSctjcGGOOjsIMv_yg1W.cYt8MMuqrJvhwv1kdMj34Z4CGWXOqd0U TR8shtazEjVXy8shw1IJxbYhcKx9DjhAlJ0ygzezcuUAb6MRIhDd.oYRGSPT q68YXPp4oZceDE4fd6zo.CPEtiF1ww4lb.QJ3Yn7ppgJ_tZxuR92OAlOBpZM BF0SmNbcoKZ7um3rAUPQ6dGrsHObDIxKYuHsuJ9lucVccsjYP4g-- Received: from [202.3.120.4] by web121205.mail.ne1.yahoo.com via HTTP; Fri, 16 Dec 2011 02:41:49 PST X-Mailer: YahooMailWebService/0.8.115.331698 References: <1323951222.19213.YahooMailNeo@web121203.mail.ne1.yahoo.com> Message-ID: <1324032109.60466.YahooMailNeo@web121205.mail.ne1.yahoo.com> Date: Fri, 16 Dec 2011 02:41:49 -0800 (PST) From: Bejoy Ks Reply-To: Bejoy Ks Subject: Re: bucketing in hive To: "user@hive.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="1078491548-1746711715-1324032109=:60466" X-Virus-Checked: Checked by ClamAV on apache.org --1078491548-1746711715-1324032109=:60466 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Ranjith=0A=C2=A0=C2=A0=C2=A0 You can definitely change the number of bucket= s in a hive table even after its creation. You need to issue an alter table= command that contains the CLUSTERED BY and/or SORTED BY clauses used by yo= ur table. For example if I have a table whose DDL looks like this=0A=0ACREA= TE EXTERNAL TABLE employee=0A(=0A=C2=A0 emp_id STRING, emp_name STRING,=0A= =C2=A0 dept STRING, location STRING,=0A)=0ACLUSTERED BY(dept,location) SORT= ED BY(dept,location) INTO 15 BUCKETS ;=0A=0AYou can ALTER the number of BUC= KETS using the ALTER TABLE command as=0A=0A=0AALTER TABLE employee CLUSTERE= D BY(dept,location) SORTED BY(dept,location) INTO 20 BUCKETS ;=0A=0A=0AThe = one major factor you need to consider here is that if you are using samplin= g queries on a partitioned - bucketed tables, you need to keep in mind that= the older partitions may have different number of buckets where as the new= partitions after the ALTER statement would have a different number of buck= ets.=0A=0AHope it helps!...=0A=0ARegards=0ABejoy.K.S=0A=0A=0A=0A___________= _____________________=0A From: "Raghunath, Ranjith" =0ATo: "'user@hive.apache.org'" ; "'bejoy_ks@ya= hoo.com'" =0ASent: Friday, December 16, 2011 10:48 AM= =0ASubject: Re: bucketing in hive=0A =0A=0AThanks Bejoy. Appreciate the ins= ight. =0A=0ADo you know of altering the number of buckets once a table has = been set up? =0A=0AThanks, =0ARanjith =0A=C2=A0=0A=0AFrom: Bejoy Ks [mailto= :bejoy_ks@yahoo.com] =0ASent: Thursday, December 15, 2011 06:13 AM=0ATo: us= er@hive.apache.org ; hive dev list =0ASubject: Re: bucketing in hive =0A=C2=A0=0A=0AHi Ranjith=0A=C2=A0=C2= =A0=C2=A0 I'm not aware of any Dynamic Bucketing in hive where as there is = definitely=C2=A0 Dynamic Partitions available. Your partitions/sub partitio= ns would be generated on the fly/dynamically based on the value of a partic= ular column .The records with same values for that column would go into the= same partition. But=C2=A0 Dynamic Partition load can't happen with a LOAD = DATA statement as it requires running mapreduce job, You can utilize dynami= c partitions in 2 steps for delimited files=0A- Load delimited file into a = non partitioned table in hive using LOAD DATA=0A=0A- Load data into destina= tion table from the source table using INSERT OVERWRITE - here a MR job wou= ld be triggered that would do the job for you.=0A=0AI have scribbled someth= ing down on the same, check whether it'd be useful for you.=0Ahttp://kickst= arthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html=0A= =0ARegards=0ABejoy.K.S=0A=0A=0A=0A________________________________=0A From:= "Raghunath, Ranjith" =0ATo: "user@hive.apache= .org" ; hive dev list =0ASent: = Thursday, December 15, 2011 7:53 AM=0ASubject: bucketing in hive=0A=0A=0A = =0ACan one use bucketing in hive to emulate hash partitions on a database? = Is there also a way to segment data into buckets dynamically based on value= s in the column. For example, =0A=C2=A0=0ACol1 =C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 Col2=0AApple=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=0AO= range =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 2=0AApple =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2=0ABana= na=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 1=0A=C2=A0=0AIf the file above were inserted into = a table with Col1 as the bucket column, can we dynamically allow all of the= rows with =E2=80=9CApple=E2=80=9D in one file and =E2=80=9COrange=E2=80=9D= in one file and so on. Is there a way to do this without specifying the bu= cket size to be 3. =0AThank you, =0ARanjith --1078491548-1746711715-1324032109=:60466 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Ranjith
    You can definitely change the = number of buckets in a hive table even after its creation. You need to issu= e an alter table command that contains the CLUSTERED BY and/or SORTED BY cl= auses used by your table. For example if I have a table whose DDL looks lik= e this

CREATE EX= TERNAL TABLE employee
(
  emp_id STRING, emp_name STRING,
&nb= sp; dept STRING, location STRING,
)
CLUSTERED BY(dept,location) SORTE= D BY(dept,location) INTO 15 BUCKETS ;

You can ALTER the number of BUCKETS usin= g the ALTER TABLE command as


ALTER TABL= E employee CLUSTERED BY(dept,location) SORTED BY(dept,location) INTO 20 BUCKETS ;

The one major factor you need to con= sider here is that if you are using sampling queries on a partitioned - buc= keted tables, you need to keep in mind that the older partitions may have d= ifferent number of buckets where as the new partitions after the ALTER stat= ement would have a different number of buckets.

Hope it helps!...

R= egards
Bejoy.K.S
=

= From: "Raghunath, Ranjith= " <Ranjith.Raghunath1@usaa.com>
To: "'user@hive.apache.org'" <user@hive.apache.org>; "= 'bejoy_ks@yahoo.com'" <bejoy_ks@yahoo.com>
Sent: Friday, December 16, 2011 10:48 AM
Subject: Re: bucketing in hi= ve

=0A
=0A=0A =0A=0A
=0AThanks Bejoy. Appreciate the insight.=0A
= =0A
=0ADo you know of altering the number of buckets once a table has be= en set up?
=0A
=0AThanks,
=0ARanjith

=0A 
=0A=
=0AFrom: Bejoy Ks [mailt= o:bejoy_ks@yahoo.com]=0A
=0ASent: Thursday, December 15, 2011 06:= 13 AM
=0ATo: user@hive.apache.org <user@hive.apache.org>; h= ive dev list <dev@hive.apache.org>=0A
=0ASubject: Re: bucke= ting in hive
=0A
 
=0A
=0A
=0A
Hi Ranjith
=0A
    I'm not aware of any Dynamic Bucketing= in hive where as there is definitely  Dynamic Partitions available. Y= our partitions/sub partitions would be generated on the fly/dynamically bas= ed on the value of a particular column .The records=0A with same values for= that column would go into the same partition. But  Dynamic Partition = load can't happen with a LOAD DATA statement as it requires running mapredu= ce job, You can utilize dynamic partitions in 2 steps for delimited files
=0A
- Load delimited file i= nto a non partitioned table in hive using LOAD DATA
=0A
=0A<= div>- Load data into destination table fro= m the source table using INSERT OVERWRITE - here a MR job would be triggere= d that would do the job for you.
=0A

=0A
=0A
I= have scribbled something down on the same, check whether it'd be useful fo= r you.
=0A
http://kickstarthadoop.blogspot.com/2011/06/how-to-spee= d-up-your-hive-queries-in.html
=0A

=0A
=0A
Regards=0A
Bejoy.K.S
=0A
= =0A

=0A
=0A
=0A
=0A=0A
=0AFrom: "Raghunath, Ranjith" <Ranjith.Raghunath1@usaa.com>
=0ATo: "user@hive.apache.org" <user@= hive.apache.org>; hive dev list <dev@hive.apache.org>=0A
=0A= Sent: Thursday, December 15, 2= 011 7:53 AM
=0ASubject: = bucketing in hive
=0A

=0A =0A
=0A
=0A
= =0A
Can one use bucketing in hive to emulate hash partitions on a= database? Is there also a way to segment data into buckets dynamically bas= ed on values in the column. For example,=0A
=0A
 = ;
=0A
Col1         = ;            &n= bsp; Col2
=0A
Apple      &n= bsp;            = ; 1
=0A
Orange       &n= bsp;         2
=0A
Apple           &nb= sp;        2
=0A
Banana           &n= bsp;     1
=0A
 =0A
If the file above were inserted into a table with Col1 as th= e bucket column, can we dynamically allow all of the rows with =E2=80=9CApp= le=E2=80=9D in one file and =E2=80=9COrange=E2=80=9D in one file and so=0A = on. Is there a way to do this without specifying the bucket size to be 3. <= /span>=0A
=0A
Thank you,=0A
=0A
Ranjith
=0A=0A
=0A
=0A =0A
=0A
=0A
=0A
=0A
=0A
= =0A=0A

=
--1078491548-1746711715-1324032109=:60466--