hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arafat, Moiz" <moiz.ara...@teamaol.com>
Subject RE: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort
Date Thu, 13 Mar 2014 10:48:28 GMT
Hi Stephen,

I followed your approach and still got the same result

1) hive> CREATE TABLE moiz_partition_test
    > (EVENT_DT STRING) partitioned by
    > (
    > PARTITION_HR INT
    > )
    > ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY '09'
    > location '/user/moiztcs/moiz_partition_test'
    > ;

2)
hive> alter table  moiz_partition_test add IF NOT EXISTS partition (partition_hr=0)  ;
OK
Time taken: 2.421 seconds
hive> alter table  moiz_partition_test add IF NOT EXISTS partition (partition_hr=1)  ;
OK
Time taken: 0.132 seconds
hive> alter table  moiz_partition_test add IF NOT EXISTS partition (partition_hr=2)  ;
OK
Time taken: 0.226 seconds
hive> alter table  moiz_partition_test add IF NOT EXISTS partition (partition_hr=10)  ;
OK
Time taken: 0.177 seconds

3)
$ hadoop fs -ls /user/moiztcs/moiz_partition_test
Found 4 items
drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:40 /user/moiztcs/moiz_partition_test/partition_hr=0
drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:41 /user/moiztcs/moiz_partition_test/partition_hr=1
drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:42 /user/moiztcs/moiz_partition_test/partition_hr=10
drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:41 /user/moiztcs/moiz_partition_test/partition_hr=2

4)
$ hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/partition_hr=0
$ hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/partition_hr=1
$ hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/partition_hr=10
$ hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/partition_hr=2

5) hive> select distinct partition_hr from moiz_partition_test order by partition_hr;

OK
0
1
10
2

Thanks,
Moiz

From: Stephen Sprague [mailto:spragues@gmail.com]
Sent: Wednesday, March 12, 2014 9:58 PM
To: user@hive.apache.org
Subject: Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

there you go.   I think you're inflicting too much of your own will onto hive with specifying
the partition directories as 00, 01, 02.

In my experience hive expects the partition name followed by an equal sign followed by the
value.
I'd stick with this kind of hdfs topology:
/user/moiztcs/moiz_partition_test/partition_hr=00/
/user/moiztcs/moiz_partition_test/partition_hr=01/
/user/moiztcs/moiz_partition_test/partition_hr=10/

By omitting the location clause on your alter table statements you should get above layout
which can be
confirmed by issuing the following command:
$ hdfs dfs -ls /user/moiztc/moiz_partition_test

Can you try this?




On Wed, Mar 12, 2014 at 12:10 AM, Arafat, Moiz <moiz.arafat@teamaol.com<mailto:moiz.arafat@teamaol.com>>
wrote:
Hi,

Here are the steps I followed . Please let me know If I did something wrong.


1)      Create table

hive> CREATE TABLE moiz_partition_test

    > (EVENT_DT STRING) partitioned by

    > (

    > PARTITION_HR INT

    > )

    >  ROW FORMAT DELIMITED

    >  FIELDS TERMINATED BY '09'

    >  location '/user/moiztcs/moiz_partition_test'

    > ;



2)      Add partitions
hive> alter table  moiz_partition_test add IF NOT EXISTS partition (partition_hr=0)  location
'/user/moiztcs/moiz_partition_test/00';
OK
Time taken: 0.411 seconds
hive> alter table  moiz_partition_test add IF NOT EXISTS partition (partition_hr=1)  location
'/user/moiztcs/moiz_partition_test/01';
OK
Time taken: 0.193 seconds
hive> alter table  moiz_partition_test add IF NOT EXISTS partition (partition_hr=2)  location
'/user/moiztcs/moiz_partition_test/02';
OK
Time taken: 0.182 seconds
hive> alter table  moiz_partition_test add IF NOT EXISTS partition (partition_hr=10)  location
'/user/moiztcs/moiz_partition_test/10';
OK
Time taken: 0.235 seconds


3)      Copy data into the directories

hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/00

hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/01

hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/02

hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/10



4)      Ran the sql

hive> select distinct partition_hr from moiz_partition_test order by partition_hr;

Ended Job

OK

0

1

10

2

Thanks,
Moiz
From: Stephen Sprague [mailto:spragues@gmail.com<mailto:spragues@gmail.com>]
Sent: Wednesday, March 12, 2014 12:55 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

that makes no sense. if the column is an int it isn't going to sort like a string.  I smell
a user error somewhere.

On Tue, Mar 11, 2014 at 6:21 AM, Arafat, Moiz <moiz.arafat@teamaol.com<mailto:moiz.arafat@teamaol.com>>
wrote:
Hi ,

I have a table that has a partition column partition_hr . Data Type is int (partition_hr 
      int) . When i run a sort on this column the output is like this.

0
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
23
3
4
5
6
7
8
9

I expected the output like this  .

0
1
2
3
4
5
6
7
8
9
10
.
.
and so on.

It works fine for non-partition columns. Please advise.

Thanks,
Moiz



Mime
View raw message