hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort
Date Thu, 13 Mar 2014 18:52:43 GMT
wow. its still sorting based on string context.  ok, some followups.

1.  did you start clean?  ie. did you do a "drop table moiz_partition_test"
before you started?

2.  lets see the output of "show create table moiz_partition_test"  (if
that doesn't work [its hive v0.11 i think] lets see "desc
moiz_partition_test" )


3.  what version of hive are you running?


pretty bizarre.




On Thu, Mar 13, 2014 at 3:48 AM, Arafat, Moiz <moiz.arafat@teamaol.com>wrote:

>  Hi Stephen,
>
>
>
> I followed your approach and still got the same result
>
>
>
> 1) hive> CREATE TABLE moiz_partition_test
>
>     > (EVENT_DT STRING) partitioned by
>
>     > (
>
>     > PARTITION_HR INT
>
>     > )
>
>     > ROW FORMAT DELIMITED
>
>     > FIELDS TERMINATED BY '09'
>
>     > location '/user/moiztcs/moiz_partition_test'
>
>     > ;
>
>
>
> 2)
>
> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
> (partition_hr=0)  ;
>
> OK
>
> Time taken: 2.421 seconds
>
> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
> (partition_hr=1)  ;
>
> OK
>
> Time taken: 0.132 seconds
>
> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
> (partition_hr=2)  ;
>
> OK
>
> Time taken: 0.226 seconds
>
> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
> (partition_hr=10)  ;
>
> OK
>
> Time taken: 0.177 seconds
>
>
>
> 3)
>
> $ hadoop fs -ls /user/moiztcs/moiz_partition_test
>
> Found 4 items
>
> drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:40
> /user/moiztcs/moiz_partition_test/partition_hr=0
>
> drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:41
> /user/moiztcs/moiz_partition_test/partition_hr=1
>
> drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:42
> /user/moiztcs/moiz_partition_test/partition_hr=10
>
> drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:41
> /user/moiztcs/moiz_partition_test/partition_hr=2
>
>
>
> 4)
>
> $ hadoop fs -copyFromLocal test.dat
> /user/moiztcs/moiz_partition_test/partition_hr=0
>
> $ hadoop fs -copyFromLocal test.dat
> /user/moiztcs/moiz_partition_test/partition_hr=1
>
> $ hadoop fs -copyFromLocal test.dat
> /user/moiztcs/moiz_partition_test/partition_hr=10
>
> $ hadoop fs -copyFromLocal test.dat
> /user/moiztcs/moiz_partition_test/partition_hr=2
>
>
>
> 5) hive> select distinct partition_hr from moiz_partition_test order by
> partition_hr;
>
>
>
> OK
>
> 0
>
> 1
>
> 10
>
> 2
>
>
>
> Thanks,
>
> Moiz
>
>
>
> *From:* Stephen Sprague [mailto:spragues@gmail.com]
> *Sent:* Wednesday, March 12, 2014 9:58 PM
>
> *To:* user@hive.apache.org
> *Subject:* Re: Hive - Sorting on the Partition Column data type Int .
> Output is Alphabetic Sort
>
>
>
> there you go.   I think you're inflicting too much of your own will onto
> hive with specifying the partition directories as 00, 01, 02.
>
> In my experience hive expects the partition name followed by an equal sign
> followed by the value.
>
> I'd stick with this kind of hdfs topology:
>
> /user/moiztcs/moiz_partition_test/partition_hr=00/
> /user/moiztcs/moiz_partition_test/partition_hr=01/
> /user/moiztcs/moiz_partition_test/partition_hr=10/
>
> By omitting the location clause on your alter table statements you should
> get above layout which can be
>
> confirmed by issuing the following command:
>
> $ hdfs dfs -ls /user/moiztc/moiz_partition_test
>
> Can you try this?
>
>
>
>
>
>
>
>
>
> On Wed, Mar 12, 2014 at 12:10 AM, Arafat, Moiz <moiz.arafat@teamaol.com>
> wrote:
>
>  Hi,
>
>
>
> Here are the steps I followed . Please let me know If I did something
> wrong.
>
>
>
> 1)      Create table
>
> hive> CREATE TABLE moiz_partition_test
>
>     > (EVENT_DT STRING) partitioned by
>
>     > (
>
>     > PARTITION_HR INT
>
>     > )
>
>     >  ROW FORMAT DELIMITED
>
>     >  FIELDS TERMINATED BY '09'
>
>     >  location '/user/moiztcs/moiz_partition_test'
>
>     > ;
>
>
>
> 2)      Add partitions
>
> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
> (partition_hr=0)  location '/user/moiztcs/moiz_partition_test/00';
>
> OK
>
> Time taken: 0.411 seconds
>
> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
> (partition_hr=1)  location '/user/moiztcs/moiz_partition_test/01';
>
> OK
>
> Time taken: 0.193 seconds
>
> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
> (partition_hr=2)  location '/user/moiztcs/moiz_partition_test/02';
>
> OK
>
> Time taken: 0.182 seconds
>
> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
> (partition_hr=10)  location '/user/moiztcs/moiz_partition_test/10';
>
> OK
>
> Time taken: 0.235 seconds
>
>
>
> 3)      Copy data into the directories
>
> hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/00
>
> hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/01
>
> hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/02
>
> hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/10
>
>
>
> 4)      Ran the sql
>
> hive> select distinct partition_hr from moiz_partition_test order by
> partition_hr;
>
> Ended Job
>
> OK
>
> 0
>
> 1
>
> 10
>
> 2
>
>
>
> Thanks,
>
> Moiz
>
> *From:* Stephen Sprague [mailto:spragues@gmail.com]
> *Sent:* Wednesday, March 12, 2014 12:55 AM
> *To:* user@hive.apache.org
> *Subject:* Re: Hive - Sorting on the Partition Column data type Int .
> Output is Alphabetic Sort
>
>
>
> that makes no sense. if the column is an int it isn't going to sort like a
> string.  I smell a user error somewhere.
>
>
>
> On Tue, Mar 11, 2014 at 6:21 AM, Arafat, Moiz <moiz.arafat@teamaol.com>
> wrote:
>
> Hi ,
>
> I have a table that has a partition column partition_hr . Data Type is int
> (partition_hr        int) . When i run a sort on this column the output is
> like this.
>
> 0
> 1
> 10
> 11
> 12
> 13
> 14
> 15
> 16
> 17
> 18
> 19
> 2
> 20
> 21
> 22
> 23
> 3
> 4
> 5
> 6
> 7
> 8
> 9
>
> I expected the output like this  .
>
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> .
> .
> and so on.
>
> It works fine for non-partition columns. Please advise.
>
> Thanks,
> Moiz
>
>
>
>
>

Mime
View raw message