Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C7CDA10EEA for ; Fri, 14 Jun 2013 07:35:07 +0000 (UTC) Received: (qmail 96523 invoked by uid 500); 14 Jun 2013 07:35:05 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 96475 invoked by uid 500); 14 Jun 2013 07:35:04 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 96462 invoked by uid 99); 14 Jun 2013 07:35:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jun 2013 07:35:04 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLY,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hamza.asad13@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jun 2013 07:34:59 +0000 Received: by mail-ie0-f176.google.com with SMTP id ar20so599984iec.7 for ; Fri, 14 Jun 2013 00:34:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=RRSPVlkctyvsEz23Gvlto0J/OJTk8OTbzLSfdRaXLPE=; b=nBmXK4FgdMfsvDsgdfDxQ6eG9bwfSKnB/9Pr7smKzFxLqxCeOU35NZBHBMQfVC0Yw8 x9yp4CrdfTNWm/RnN3z17IZWK6YrFBf5GQ61iOs8R04a/AsRpvdmv3IYON3CU9wSy+lO pGlGUQM/nxkH1voeqa9809Z32oqCRvtYRf6DfSfEJXCuF3NkyFexC9bI4atYnYHh+xpC aClDtyi8C7e7NWXt3tNjVrA+WjRIj49Ngeh3LbYVy/sM4TDUt/Xx775Cju3KK92ICQ7d Wxltcz5Q/x4nqURsKHuNUdBwAabazg3rb4KymCRPdxDv9kORLbwcV1sVVVxg7y9103Er nHLQ== MIME-Version: 1.0 X-Received: by 10.50.119.98 with SMTP id kt2mr384534igb.15.1371195279229; Fri, 14 Jun 2013 00:34:39 -0700 (PDT) Received: by 10.43.3.74 with HTTP; Fri, 14 Jun 2013 00:34:39 -0700 (PDT) In-Reply-To: References: Date: Fri, 14 Jun 2013 12:34:39 +0500 Message-ID: Subject: Re: dynamic Partition not splitting properly From: Hamza Asad To: user@hive.apache.org Content-Type: multipart/alternative; boundary=089e0118411689579f04df184798 X-Virus-Checked: Checked by ClamAV on apache.org --089e0118411689579f04df184798 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable sample row of my data is *591269735,1,1022,2012-06-24 11:08:10.9,null,2,null,null,null,null,null,null,null,null,12,null,null,2,nu= ll,null,null,null,4,1,null,null,null,null,null,null * and i want to partition it according to date i.e *2012-06-24 *skiping the hh:mm:ss.ff part On Fri, Jun 14, 2013 at 12:27 PM, Nitin Pawar wrot= e: > can you provide whats your data and what you want it to look like ? > > > On Fri, Jun 14, 2013 at 12:31 PM, Hamza Asad wrot= e: > >> which UDF? it does not take to_date(event_date) column >> >> >> On Fri, Jun 14, 2013 at 11:54 AM, Nitin Pawar w= rote: >> >>> use already existing UDFs to split or transform your values the way you >>> want >>> >>> >>> On Fri, Jun 14, 2013 at 12:09 PM, Hamza Asad wr= ote: >>> >>>> OIC. I got it. Thanx alot nitin :). One more thing i want to ask >>>> related this issue, if old table contains event_date in format "2012-0= 6-24 >>>> 06:04:11.9" then how can i partition it according to date part only? A= s >>>> partition column does not accepts to_date(event_date) form. >>>> >>>> >>>> On Thu, Jun 13, 2013 at 5:07 PM, Nitin Pawar = wrote: >>>> >>>>> If the input column value is NULL or empty string, the row will be pu= t into a special partition, whose name is controlled by the hive parameter = hive.exec.default.dynamic.partition.name. The default value is `__HIVE_DEFA= ULT_PARTITION__`. Basically this partition will contain all >>>>> "bad" rows whose value are not valid partition names. >>>>> >>>>> so basically you do following things >>>>> >>>>> when you create a partitioned table, your partitioned column is norma= lly at the end of the table, so when you are inserting data into this parti= tioned table, I would recommend using the column names in place select * fr= om >>>>> >>>>> so your insert query should look like >>>>> >>>>> set hive.exec.dynamic.partition=3Dtrue; >>>>> >>>>> >>>>> >>>>> >>>>> set hive.exec.dynamic.partition.mode=3Dnonstrict; >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> insert overwrite table new_table partition(event_date) select col1, c= ol2 .... coln, event_date from old_table; >>>>> >>>>> >>>>> >>>>> On Thu, Jun 13, 2013 at 5:24 PM, Hamza Asad w= rote: >>>>> >>>>>> when i browse it in browser, all the data is in * >>>>>> event_date=3D__HIVE_DEFAULT_PARTITION__ >>>>>> *, rest of the files does not contains data >>>>>> >>>>>> >>>>>> On Thu, Jun 13, 2013 at 4:52 PM, Nitin Pawar >>>>> > wrote: >>>>>> >>>>>>> what do you mean when you say "it wont split correctly" ? >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 13, 2013 at 5:19 PM, Hamza Asad wrote: >>>>>>> >>>>>>>> what if i have data of more then 500 days then how can i create >>>>>>>> partition on date column by specifying each and every date? (i knw= that >>>>>>>> does not happens in dynamic partition but on dynamic partition, it= wont >>>>>>>> splits correctly). >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 13, 2013 at 4:12 PM, Nitin Pawar < >>>>>>>> nitinpawar432@gmail.com> wrote: >>>>>>>> >>>>>>>>> you can partition existing table unless the hdfs data is laid out >>>>>>>>> in partitioned fashion. >>>>>>>>> your best bet is create a new partitioned table >>>>>>>>> enable dynamic paritionining >>>>>>>>> read from old table and write into new table >>>>>>>>> >>>>>>>>> you can verify the new partitions by using command "show >>>>>>>>> partitions" >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jun 13, 2013 at 4:40 PM, Hamza Asad < >>>>>>>>> hamza.asad13@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> now i created partition table like >>>>>>>>>> *CREATE TABLE new_rc_partition_cluster_table( >>>>>>>>>> >>>>>>>>>> id int, >>>>>>>>>> event_id int, >>>>>>>>>> user_id BIGINT, >>>>>>>>>> >>>>>>>>>> intval_1 int , >>>>>>>>>> intval_2 int, >>>>>>>>>> intval_3 int, >>>>>>>>>> intval_4 int, >>>>>>>>>> intval_5 int, >>>>>>>>>> intval_6 int, >>>>>>>>>> intval_7 int, >>>>>>>>>> intval_8 int, >>>>>>>>>> intval_9 int, >>>>>>>>>> intval_10 int, >>>>>>>>>> intval_11 int, >>>>>>>>>> intval_12 int, >>>>>>>>>> intval_13 int, >>>>>>>>>> intval_14 int, >>>>>>>>>> intval_15 int, >>>>>>>>>> intval_16 int, >>>>>>>>>> intval_17 int, >>>>>>>>>> intval_18 int, >>>>>>>>>> intval_19 int, >>>>>>>>>> intval_20 int, >>>>>>>>>> intval_21 int, >>>>>>>>>> intval_22 int, >>>>>>>>>> intval_23 int, >>>>>>>>>> intval_24 int, >>>>>>>>>> intval_25 int, >>>>>>>>>> intval_26 int) >>>>>>>>>> PARTITIONED BY (event_date string) >>>>>>>>>> >>>>>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS >>>>>>>>>> ROW FORMAT DELIMITED >>>>>>>>>> FIELDS TERMINATED BY ',' >>>>>>>>>> STORED AS RCFile; >>>>>>>>>> * >>>>>>>>>> >>>>>>>>>> rest of the commands are same. But this time INSERT OVERWRITE >>>>>>>>>> query executed and took time but when i queries from that table,= it results >>>>>>>>>> none as it does not contains data. why is this so? and also plea= se tell me >>>>>>>>>> how can i partition my existing table dynamicaly on date so that= data >>>>>>>>>> splits equally without mentioning date explicitly? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 12, 2013 at 6:50 PM, Nitin Pawar < >>>>>>>>>> nitinpawar432@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> you did not create partitioned table. You just created a >>>>>>>>>>> bucketed table. >>>>>>>>>>> >>>>>>>>>>> refer to partitioned table created >>>>>>>>>>> something like >>>>>>>>>>> partitioned by (event_date string) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jun 12, 2013 at 7:17 PM, Hamza Asad < >>>>>>>>>>> hamza.asad13@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> i have created table after enabling dynamic partition. i >>>>>>>>>>>> partitioned it on date but it is not splitting data datewise. = Below is the >>>>>>>>>>>> query of table created and data insert >>>>>>>>>>>> CREATE TABLE rc_partition_cluster_table( >>>>>>>>>>>> id int, >>>>>>>>>>>> event_id int, >>>>>>>>>>>> user_id BIGINT, >>>>>>>>>>>> event_date string, >>>>>>>>>>>> intval_1 int ) >>>>>>>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS >>>>>>>>>>>> ROW FORMAT DELIMITED >>>>>>>>>>>> FIELDS TERMINATED BY ',' >>>>>>>>>>>> STORED AS RCFile; >>>>>>>>>>>> >>>>>>>>>>>> set hive.exec.dynamic.partition=3Dtrue; >>>>>>>>>>>> set hive.exec.dynamic.partition.mode=3Dnonstrict; >>>>>>>>>>>> set hive.exec.max.dynamic.partitions=3D1000; >>>>>>>>>>>> set hive.exec.max.dynamic.partitions.pernode=3D1000; >>>>>>>>>>>> >>>>>>>>>>>> INSERT OVERWRITE TABLE rc_partition_cluster_table Partition >>>>>>>>>>>> (event_date) >>>>>>>>>>>> SELECT * FROM events_details; >>>>>>>>>>>> >>>>>>>>>>>> why it is not working fine? >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> *Muhammad Hamza Asad* >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nitin Pawar >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Muhammad Hamza Asad* >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nitin Pawar >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Muhammad Hamza Asad* >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nitin Pawar >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Muhammad Hamza Asad* >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Nitin Pawar >>>>> >>>> >>>> >>>> >>>> -- >>>> *Muhammad Hamza Asad* >>>> >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> >> >> -- >> *Muhammad Hamza Asad* >> > > > > -- > Nitin Pawar > --=20 *Muhammad Hamza Asad* --089e0118411689579f04df184798 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
sample row of my data is
591269735,1,1022,2012= -06-24 11:08:10.9,null,2,null,null,null,null,null,null,null,null,12,null,nu= ll,2,null,null,null,null,4,1,null,null,null,null,null,null=A0

<= /div> and i want to partition it according to date i.e 2012-06-24 skiping = the hh:mm:ss.ff part


On Fri, Jun 14, 2013 at 12:27 PM, Nitin Pawar <niti= npawar432@gmail.com> wrote:
can you provide whats your = data and what you want it to look like ?=A0


On Fri, Jun 14, 2013 a= t 12:31 PM, Hamza Asad <hamza.asad13@gmail.com> wrote:<= br>
which UDF? it does not take= to_date(event_date) column


On Fri, Jun 14, 2013 at 11:54 AM, Nitin = Pawar <nitinpawar432@gmail.com> wrote:
use already existing UDFs t= o split or transform your values the way you want=A0


On Fri, Jun 14, 2013 at 12:09 = PM, Hamza Asad <hamza.asad13@gmail.com> wrote:
OIC. I got it. Thanx alot n= itin :). One more thing i want to ask related this issue, if old table cont= ains event_date in format "2012-06-24 06:04:11.9" then how can i = partition it according to date part only? As partition column does not acce= pts to_date(event_date) form.


On Thu, Jun 13, 2013 at 5:07 PM, Nitin Pawar <nitinpawar432@gmail= .com> wrote:
If the input column va=
lue is NULL or empty string, the row will be put into a special partition, =
whose name is controlled by the hive parameter hive.exec.default.dynamic=
.partition.name. The default value is `__HIVE_DEFAULT_PARTITION__`. Bas=
ically this partition will contain all
"bad" rows whose value are not valid partition names.
so basically you do following things 
when you create a partitioned t=
able, your partitioned column is normally at the end of the table, so when =
you are inserting data into this partitioned table, I would recommend using=
 the column names in place select * from 
so your insert query should look like 
set hive.exec.dynamic.partition=3D=
true;
set hive.exec.dynamic.partition.mode=3Dnonstrict;

insert overwrite table new_ta=
ble partition(event_date)=A0select=A0=
col1, col2 .... coln, event_date from old_table;


On Thu, Jun 13, 2013 at 5:24 PM, Hamza Asad <hamza.asad13@gmail.co= m> wrote:


On Thu, Jun 13, 2013 at 4:52 PM, Nitin Pawar <nitinpawar432@gmail= .com> wrote:
what do you mean when you s= ay "it wont split correctly" ?=A0


On Thu, Jun 13, 2013 at 5:19 PM, Ha= mza Asad <hamza.asad13@gmail.com> wrote:
what if i have data of more= then 500 days then how can i create partition on date column by specifying= each and every date? (i knw that does not happens in dynamic partition but= on dynamic partition, it wont splits correctly).


On Thu, Jun 13, 2013 at 4:12 PM, Nitin Pawar <nitinpawar432@gmail= .com> wrote:
you can partition existing = table unless the hdfs data is laid out in partitioned fashion.=A0
your = best bet is create a new partitioned table
enable dynamic paritionining
read from old table and write into new table=A0

yo= u can verify the new partitions by using command "show partitions"= ;


On Thu, Jun 13, 2013 at 4:40 PM, Hamza Asad <hamza.asad13@gmail.com= > wrote:
now i created partition table like
CREATE TABLE= new_rc_partition_cluster_table(

=A0 id int,
=A0 event_id int,=A0 user_id BIGINT,
=A0
=A0 intval_1 int ,
=A0 intval_2 i= nt,
=A0 intval_3 int,
=A0 intval_4 int,
=A0 intval_5 int,
=A0 intval_6 int,
=A0 intval_7= int,
=A0 intval_8 int,
=A0 intval_9 int,
=A0 intval_10 int,
= =A0 intval_11 int,
=A0 intval_12 int,
=A0 intval_13 int,
=A0 intva= l_14 int,
=A0 intval_15 int,
=A0 intval_16 int,
=A0 intval_17 int,
=A0 intval_18 int,
=A0 intva= l_19 int,
=A0 intval_20 int,
=A0 intval_21 int,
=A0 intval_22 int,=
=A0 intval_23 int,
=A0 intval_24 int,
=A0 intval_25 int,
=A0 i= ntval_26 int)
=A0 PARTITIONED BY (event_date string)

CLUSTERED BY(id) INTO 256 BUCKETS
ROW FORMAT DELIMITED
=A0FIELDS TER= MINATED BY ','
STORED AS RCFile;


rest of= the commands are same. But this time INSERT OVERWRITE query executed and t= ook time but when i queries from that table, it results none as it does not= contains data. why is this so? and also please tell me how can i partition= my existing table dynamicaly on date so that data splits equally without m= entioning date explicitly?


On Wed, Jun 12, 2013 at 6:50 PM, Nitin Pawar <nitinpawar432@gmail= .com> wrote:
you did not create partitio= ned table. You just created a bucketed table.

refer to p= artitioned table created=A0
something like=A0
partitioned by (event_date string)=A0


On Wed, Jun 12, 2013 at 7:17 PM, Hamza Asad <hamza.asad13@gmail.co= m> wrote:
i have created table after = enabling dynamic partition. i partitioned it on date but it is not splittin= g data datewise. Below is the query of table created and data insert
CREATE TABLE rc_partition_cluster_table(
=A0 id int,
=A0 event_id int,
=A0 user_id BIGINT,
=A0 event_date s= tring,
=A0 intval_1 int )
CLUSTERED BY(id) INTO 256 BUCKETS
ROW FO= RMAT DELIMITED
=A0FIELDS TERMINATED BY ','
STORED AS RCFile= ;

set hive.exec.dynamic.partition=3Dtrue;
set hive.exec.dynamic.p= artition.mode=3Dnonstrict;
set hive.exec.max.dynamic.partitions=3D1000;<= br>set hive.exec.max.dynamic.partitions.pernode=3D1000;

INSERT OVERW= RITE TABLE rc_partition_cluster_table Partition (event_date)
SELECT * FROM events_details;

why it is not working fine?=

--
Muhammad H= amza Asad



<= font color=3D"#888888">--
Nitin Pawar



--
Muhammad Hamza Asad



<= font color=3D"#888888">--
Nitin Pawar



--
Muhammad Hamza Asad



<= font color=3D"#888888">--
Nitin Pawar



--
Muhammad Hamza Asad



<= font color=3D"#888888">--
Nitin Pawar



--
Muhammad Hamza Asad



<= font color=3D"#888888">--
Nitin Pawar



--
Muhammad Hamza Asad



--
Nitin Pawar



--
<= b style=3D"color:rgb(102,102,102);font-family:georgia,serif">Muhammad Ha= mza Asad
--089e0118411689579f04df184798--