hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort
Date Fri, 14 Mar 2014 16:14:10 GMT
i agree. back in the days of v0.7 who knows if this was a bug or not. What
i do know is i have int partitions all over the place and they sort in
numeric context just fine every day.  i'm running v0.12.  pretty sure this
worked fine at v0.8 as well - that's when i started in hive.

good luck!


On Fri, Mar 14, 2014 at 4:21 AM, Nitin Pawar <nitinpawar432@gmail.com>wrote:

> Can you first try updating hive to atleast 0.11 if you can not move to
> 0.12 ?
>
>
> On Fri, Mar 14, 2014 at 4:49 PM, Arafat, Moiz <moiz.arafat@teamaol.com>wrote:
>
>>  My comments inline
>>
>>
>>
>> *From:* Stephen Sprague [mailto:spragues@gmail.com]
>> *Sent:* Friday, March 14, 2014 12:23 AM
>>
>> *To:* user@hive.apache.org
>> *Subject:* Re: Hive - Sorting on the Partition Column data type Int .
>> Output is Alphabetic Sort
>>
>>
>>
>> wow. its still sorting based on string context.  ok, some followups.
>>
>> 1.    did you start clean?  ie. did you do a "drop table
>> moiz_partition_test" before you started?
>>
>> Yes I did
>>
>> 2.    lets see the output of "show create table moiz_partition_test"
>> (if that doesn't work [its hive v0.11 i think] lets see "desc
>> moiz_partition_test" )
>>
>> hive> desc moiz_partition_test;
>>
>> OK
>>
>> event_dt        string
>>
>> partition_hr    int
>>
>> Time taken: 1.967 seconds
>>
>>   3.    what version of hive are you running?
>>
>> 0.7.1
>>
>>   pretty bizarre.
>>
>>
>>
>>
>>
>> On Thu, Mar 13, 2014 at 3:48 AM, Arafat, Moiz <moiz.arafat@teamaol.com>
>> wrote:
>>
>>  Hi Stephen,
>>
>>
>>
>> I followed your approach and still got the same result
>>
>>
>>
>> 1) hive> CREATE TABLE moiz_partition_test
>>
>>     > (EVENT_DT STRING) partitioned by
>>
>>     > (
>>
>>     > PARTITION_HR INT
>>
>>     > )
>>
>>     > ROW FORMAT DELIMITED
>>
>>     > FIELDS TERMINATED BY '09'
>>
>>     > location '/user/moiztcs/moiz_partition_test'
>>
>>     > ;
>>
>>
>>
>> 2)
>>
>> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
>> (partition_hr=0)  ;
>>
>> OK
>>
>> Time taken: 2.421 seconds
>>
>> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
>> (partition_hr=1)  ;
>>
>> OK
>>
>> Time taken: 0.132 seconds
>>
>> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
>> (partition_hr=2)  ;
>>
>> OK
>>
>> Time taken: 0.226 seconds
>>
>> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
>> (partition_hr=10)  ;
>>
>> OK
>>
>> Time taken: 0.177 seconds
>>
>>
>>
>> 3)
>>
>> $ hadoop fs -ls /user/moiztcs/moiz_partition_test
>>
>> Found 4 items
>>
>> drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:40
>> /user/moiztcs/moiz_partition_test/partition_hr=0
>>
>> drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:41
>> /user/moiztcs/moiz_partition_test/partition_hr=1
>>
>> drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:42
>> /user/moiztcs/moiz_partition_test/partition_hr=10
>>
>> drwxr-xr-x   - cdidw aolmis          0 2014-03-13 06:41
>> /user/moiztcs/moiz_partition_test/partition_hr=2
>>
>>
>>
>> 4)
>>
>> $ hadoop fs -copyFromLocal test.dat
>> /user/moiztcs/moiz_partition_test/partition_hr=0
>>
>> $ hadoop fs -copyFromLocal test.dat
>> /user/moiztcs/moiz_partition_test/partition_hr=1
>>
>> $ hadoop fs -copyFromLocal test.dat
>> /user/moiztcs/moiz_partition_test/partition_hr=10
>>
>> $ hadoop fs -copyFromLocal test.dat
>> /user/moiztcs/moiz_partition_test/partition_hr=2
>>
>>
>>
>> 5) hive> select distinct partition_hr from moiz_partition_test order by
>> partition_hr;
>>
>>
>>
>> OK
>>
>> 0
>>
>> 1
>>
>> 10
>>
>> 2
>>
>>
>>
>> Thanks,
>>
>> Moiz
>>
>>
>>
>> *From:* Stephen Sprague [mailto:spragues@gmail.com]
>> *Sent:* Wednesday, March 12, 2014 9:58 PM
>>
>>
>> *To:* user@hive.apache.org
>> *Subject:* Re: Hive - Sorting on the Partition Column data type Int .
>> Output is Alphabetic Sort
>>
>>
>>
>> there you go.   I think you're inflicting too much of your own will onto
>> hive with specifying the partition directories as 00, 01, 02.
>>
>> In my experience hive expects the partition name followed by an equal
>> sign followed by the value.
>>
>> I'd stick with this kind of hdfs topology:
>>
>> /user/moiztcs/moiz_partition_test/partition_hr=00/
>> /user/moiztcs/moiz_partition_test/partition_hr=01/
>> /user/moiztcs/moiz_partition_test/partition_hr=10/
>>
>> By omitting the location clause on your alter table statements you should
>> get above layout which can be
>>
>> confirmed by issuing the following command:
>>
>> $ hdfs dfs -ls /user/moiztc/moiz_partition_test
>>
>> Can you try this?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Mar 12, 2014 at 12:10 AM, Arafat, Moiz <moiz.arafat@teamaol.com>
>> wrote:
>>
>>  Hi,
>>
>>
>>
>> Here are the steps I followed . Please let me know If I did something
>> wrong.
>>
>>
>>
>> 1)      Create table
>>
>> hive> CREATE TABLE moiz_partition_test
>>
>>     > (EVENT_DT STRING) partitioned by
>>
>>     > (
>>
>>     > PARTITION_HR INT
>>
>>     > )
>>
>>     >  ROW FORMAT DELIMITED
>>
>>     >  FIELDS TERMINATED BY '09'
>>
>>     >  location '/user/moiztcs/moiz_partition_test'
>>
>>     > ;
>>
>>
>>
>> 2)      Add partitions
>>
>> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
>> (partition_hr=0)  location '/user/moiztcs/moiz_partition_test/00';
>>
>> OK
>>
>> Time taken: 0.411 seconds
>>
>> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
>> (partition_hr=1)  location '/user/moiztcs/moiz_partition_test/01';
>>
>> OK
>>
>> Time taken: 0.193 seconds
>>
>> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
>> (partition_hr=2)  location '/user/moiztcs/moiz_partition_test/02';
>>
>> OK
>>
>> Time taken: 0.182 seconds
>>
>> hive> alter table  moiz_partition_test add IF NOT EXISTS partition
>> (partition_hr=10)  location '/user/moiztcs/moiz_partition_test/10';
>>
>> OK
>>
>> Time taken: 0.235 seconds
>>
>>
>>
>> 3)      Copy data into the directories
>>
>> hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/00
>>
>> hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/01
>>
>> hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/02
>>
>> hadoop fs -copyFromLocal test.dat  /user/moiztcs/moiz_partition_test/10
>>
>>
>>
>> 4)      Ran the sql
>>
>> hive> select distinct partition_hr from moiz_partition_test order by
>> partition_hr;
>>
>> Ended Job
>>
>> OK
>>
>> 0
>>
>> 1
>>
>> 10
>>
>> 2
>>
>>
>>
>> Thanks,
>>
>> Moiz
>>
>> *From:* Stephen Sprague [mailto:spragues@gmail.com]
>> *Sent:* Wednesday, March 12, 2014 12:55 AM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Hive - Sorting on the Partition Column data type Int .
>> Output is Alphabetic Sort
>>
>>
>>
>> that makes no sense. if the column is an int it isn't going to sort like
>> a string.  I smell a user error somewhere.
>>
>>
>>
>> On Tue, Mar 11, 2014 at 6:21 AM, Arafat, Moiz <moiz.arafat@teamaol.com>
>> wrote:
>>
>> Hi ,
>>
>> I have a table that has a partition column partition_hr . Data Type is
>> int (partition_hr        int) . When i run a sort on this column the output
>> is like this.
>>
>> 0
>> 1
>> 10
>> 11
>> 12
>> 13
>> 14
>> 15
>> 16
>> 17
>> 18
>> 19
>> 2
>> 20
>> 21
>> 22
>> 23
>> 3
>> 4
>> 5
>> 6
>> 7
>> 8
>> 9
>>
>> I expected the output like this  .
>>
>> 0
>> 1
>> 2
>> 3
>> 4
>> 5
>> 6
>> 7
>> 8
>> 9
>> 10
>> .
>> .
>> and so on.
>>
>> It works fine for non-partition columns. Please advise.
>>
>> Thanks,
>> Moiz
>>
>>
>>
>>
>>
>>
>>
>
>
>
> --
> Nitin Pawar
>

Mime
View raw message