hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markovitz, Dudu" <dmarkov...@paypal.com>
Subject RE: alter partitions on hive external table
Date Mon, 06 Jun 2016 20:28:25 GMT
And here is a full example

----------------------------------------------------------------------------------------------------
--             bash
----------------------------------------------------------------------------------------------------

mkdir -p t
mkdir -p t/20150122/dudu/cust1
mkdir -p t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd
mkdir -p t/raj/20150204/cust1
mkdir -p t/raj/cust2/yyy/20150204/zzz
mkdir -p t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz

echo -e "1\n2\n3" > t/20150122/dudu/cust1/data.txt
echo -e "4"       > t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd/data.txt
echo -e "5\n6"    > t/raj/20150204/cust1/data.txt
echo -e "7\n8\n9" > t/raj/cust2/yyy/20150204/zzz/data.txt
echo -e "10"      > t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz/data.txt

hdfs dfs -put t /tmp

----------------------------------------------------------------------------------------------------
--             hive
----------------------------------------------------------------------------------------------------


n  We’re creating the external table with the requested partition columns


create external table t (i int) partitioned by (user string,cust string,dt date) location
'/tmp/t';


n  We’re choosing each partition values according the full path of the relevant directory

alter table t add partition (user='dudu',cust='cust1',dt=date '2015-01-22') location '/tmp/t/20150122/dudu/cust1';
alter table t add partition (user='dudu',cust='cust2',dt=date '2015-01-23') location '/tmp/t/cust2/aaa/2015/01/23/bbb/ccc/dudu/ddd';
alter table t add partition (user='raj' ,cust='cust1',dt=date '2015-02-04') location '/tmp/t/raj/20150204/cust1';
alter table t add partition (user='raj' ,cust='cust2',dt=date '2015-02-04') location '/tmp/t/raj/cust2/yyy/20150204/zzz';
alter table t add partition (user='raj' ,cust='cust3',dt=date '2015-02-04') location '/tmp/t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz';


n  The partitions’ values and their corresponding locations are all saved in the metastore

n  The metastore is being queried based on our query predicates. Returning the list of relevant
partitions/locations


explain dependency select * from t where (cust like '%1' and dt < date '2015-02-01') or
(user='raj' and substr(cust,-1) = 3) ;

{"input_partitions":[{"partitionName":"default@t@user=dudu/cust=cust1/dt=2015-01-22"},{"partitionName":"default@t@user=raj/cust=cust3/dt=2015-02-04"}],"input_tables":[{"tablename":"default@t","tabletype":"EXTERNAL_TABLE"}]}

select *,input__file__name from t where (cust like '%1' and dt < date '2015-02-01') or
(user='raj' and substr(cust,-1) = 3) ;

1         dudu      cust1     2015-01-22         hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt
2         dudu      cust1     2015-01-22         hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt
3         dudu      cust1     2015-01-22         hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt
10        raj       cust3     2015-02-04          hdfs://quickstart.cloudera:8020/tmp/t/bla/bla/bla/raj/yada/yada/yada/cust3/yyy/20150204/zzz/data.txt



From: Markovitz, Dudu [mailto:dmarkovitz@paypal.com]
Sent: Monday, June 06, 2016 6:10 PM
To: user@hive.apache.org
Subject: RE: alter partitions on hive external table

… are just logical connections between certain values and specific directories …

From: Markovitz, Dudu [mailto:dmarkovitz@paypal.com]
Sent: Monday, June 06, 2016 6:07 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: alter partitions on hive external table

Hi Raj


1.       I don’t understand the reason for this change, can you please elaborate?



2.       External table is just an interface. Instructions for how to read existing data.

Partitions of external table are just a logical connections between certain values and a specific
directories.

You can connect any set of values to any directory no matter what the directories structure
is and then query the external table filtering on this values and by that eliminating the
query only to the directories you are interested in.



3.       By all means, don’t duplicate data without a good reason (unless you don’t care
about wasting storage, time, CPU etc.)

It seems to me that all you need to do is to retrieve a list of the directories and generate
“alter table … add partition…” statements based on that.


Dudu

From: raj hive [mailto:raj.hiveql@gmail.com]
Sent: Monday, June 06, 2016 6:02 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: alter partitions on hive external table

Hi friends,
I have created partitions on hive external tables. partitions on datetime/userid/customerId.
now i have to change the order of the partitions for the existing data for all the dates.
order of the partition is custerid/userid/datetime.
Anyone can help me, how to alter the partitions for the existing table. Need a help to write
a script to change the partions on existing data. almost 3 months data is there to modify
as per new partition so changing each date is difficult. Any expert can help me.
Thanks
Raj
Mime
View raw message