Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A42E710DFD for ; Thu, 13 Mar 2014 18:53:38 +0000 (UTC) Received: (qmail 91250 invoked by uid 500); 13 Mar 2014 18:53:35 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 90416 invoked by uid 500); 13 Mar 2014 18:53:30 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 90389 invoked by uid 99); 13 Mar 2014 18:53:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 18:53:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of spragues@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qa0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 18:53:24 +0000 Received: by mail-qa0-f44.google.com with SMTP id f11so1474124qae.31 for ; Thu, 13 Mar 2014 11:53:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=k5vwj7zJBNWuxsxsDHF01LE6MqZ81NmsCcKkapuk0/o=; b=mrqMrUwnVSVDaLeqVxOtswlwBFt6MYHxivsv5BUcdtuynC+syfTpiA5yXONliwGI/g silVV1emP/pXTR4lumIG8aJX2t0IIzJphnzcmz9UB5mjKxX8G9jvO6c29a+BPKQUcCiS NJsTvI7kogFVosmAj7VpXVtklfDl7G6FWNjU3bJ1FKev44mA5N/6vnEevdF1K9rib5H6 uskDlOdM7p6Ljs+y8u4hjjEwqbT0vOSlziPGZ5oPyUFYQ/T/VXHAFtgaWYPuSCwzI9H6 kVHcgEbSQvITWli/J6Q5Fu2rC5D85blzAvXT81WxKhWPJbLwUgOk6fzBcNe+1D3YZZ+w K9SA== X-Received: by 10.224.25.195 with SMTP id a3mr4374513qac.83.1394736783338; Thu, 13 Mar 2014 11:53:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.92.41 with HTTP; Thu, 13 Mar 2014 11:52:43 -0700 (PDT) In-Reply-To: <8EF10CF4A313F145AABA459CBFB81F0B3980EAB8@aoldtcmes31.ad.aol.aoltw.net> References: <73EB99C1-D9DE-4B4B-BF1D-2EE51DF157FB@teamaol.com> <8EF10CF4A313F145AABA459CBFB81F0B3980D4AA@aoldtcmes31.ad.aol.aoltw.net> <8EF10CF4A313F145AABA459CBFB81F0B3980EAB8@aoldtcmes31.ad.aol.aoltw.net> From: Stephen Sprague Date: Thu, 13 Mar 2014 11:52:43 -0700 Message-ID: Subject: Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort To: "user@hive.apache.org" Content-Type: multipart/alternative; boundary=047d7bdca04c86b2ca04f4817659 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdca04c86b2ca04f4817659 Content-Type: text/plain; charset=ISO-8859-1 wow. its still sorting based on string context. ok, some followups. 1. did you start clean? ie. did you do a "drop table moiz_partition_test" before you started? 2. lets see the output of "show create table moiz_partition_test" (if that doesn't work [its hive v0.11 i think] lets see "desc moiz_partition_test" ) 3. what version of hive are you running? pretty bizarre. On Thu, Mar 13, 2014 at 3:48 AM, Arafat, Moiz wrote: > Hi Stephen, > > > > I followed your approach and still got the same result > > > > 1) hive> CREATE TABLE moiz_partition_test > > > (EVENT_DT STRING) partitioned by > > > ( > > > PARTITION_HR INT > > > ) > > > ROW FORMAT DELIMITED > > > FIELDS TERMINATED BY '09' > > > location '/user/moiztcs/moiz_partition_test' > > > ; > > > > 2) > > hive> alter table moiz_partition_test add IF NOT EXISTS partition > (partition_hr=0) ; > > OK > > Time taken: 2.421 seconds > > hive> alter table moiz_partition_test add IF NOT EXISTS partition > (partition_hr=1) ; > > OK > > Time taken: 0.132 seconds > > hive> alter table moiz_partition_test add IF NOT EXISTS partition > (partition_hr=2) ; > > OK > > Time taken: 0.226 seconds > > hive> alter table moiz_partition_test add IF NOT EXISTS partition > (partition_hr=10) ; > > OK > > Time taken: 0.177 seconds > > > > 3) > > $ hadoop fs -ls /user/moiztcs/moiz_partition_test > > Found 4 items > > drwxr-xr-x - cdidw aolmis 0 2014-03-13 06:40 > /user/moiztcs/moiz_partition_test/partition_hr=0 > > drwxr-xr-x - cdidw aolmis 0 2014-03-13 06:41 > /user/moiztcs/moiz_partition_test/partition_hr=1 > > drwxr-xr-x - cdidw aolmis 0 2014-03-13 06:42 > /user/moiztcs/moiz_partition_test/partition_hr=10 > > drwxr-xr-x - cdidw aolmis 0 2014-03-13 06:41 > /user/moiztcs/moiz_partition_test/partition_hr=2 > > > > 4) > > $ hadoop fs -copyFromLocal test.dat > /user/moiztcs/moiz_partition_test/partition_hr=0 > > $ hadoop fs -copyFromLocal test.dat > /user/moiztcs/moiz_partition_test/partition_hr=1 > > $ hadoop fs -copyFromLocal test.dat > /user/moiztcs/moiz_partition_test/partition_hr=10 > > $ hadoop fs -copyFromLocal test.dat > /user/moiztcs/moiz_partition_test/partition_hr=2 > > > > 5) hive> select distinct partition_hr from moiz_partition_test order by > partition_hr; > > > > OK > > 0 > > 1 > > 10 > > 2 > > > > Thanks, > > Moiz > > > > *From:* Stephen Sprague [mailto:spragues@gmail.com] > *Sent:* Wednesday, March 12, 2014 9:58 PM > > *To:* user@hive.apache.org > *Subject:* Re: Hive - Sorting on the Partition Column data type Int . > Output is Alphabetic Sort > > > > there you go. I think you're inflicting too much of your own will onto > hive with specifying the partition directories as 00, 01, 02. > > In my experience hive expects the partition name followed by an equal sign > followed by the value. > > I'd stick with this kind of hdfs topology: > > /user/moiztcs/moiz_partition_test/partition_hr=00/ > /user/moiztcs/moiz_partition_test/partition_hr=01/ > /user/moiztcs/moiz_partition_test/partition_hr=10/ > > By omitting the location clause on your alter table statements you should > get above layout which can be > > confirmed by issuing the following command: > > $ hdfs dfs -ls /user/moiztc/moiz_partition_test > > Can you try this? > > > > > > > > > > On Wed, Mar 12, 2014 at 12:10 AM, Arafat, Moiz > wrote: > > Hi, > > > > Here are the steps I followed . Please let me know If I did something > wrong. > > > > 1) Create table > > hive> CREATE TABLE moiz_partition_test > > > (EVENT_DT STRING) partitioned by > > > ( > > > PARTITION_HR INT > > > ) > > > ROW FORMAT DELIMITED > > > FIELDS TERMINATED BY '09' > > > location '/user/moiztcs/moiz_partition_test' > > > ; > > > > 2) Add partitions > > hive> alter table moiz_partition_test add IF NOT EXISTS partition > (partition_hr=0) location '/user/moiztcs/moiz_partition_test/00'; > > OK > > Time taken: 0.411 seconds > > hive> alter table moiz_partition_test add IF NOT EXISTS partition > (partition_hr=1) location '/user/moiztcs/moiz_partition_test/01'; > > OK > > Time taken: 0.193 seconds > > hive> alter table moiz_partition_test add IF NOT EXISTS partition > (partition_hr=2) location '/user/moiztcs/moiz_partition_test/02'; > > OK > > Time taken: 0.182 seconds > > hive> alter table moiz_partition_test add IF NOT EXISTS partition > (partition_hr=10) location '/user/moiztcs/moiz_partition_test/10'; > > OK > > Time taken: 0.235 seconds > > > > 3) Copy data into the directories > > hadoop fs -copyFromLocal test.dat /user/moiztcs/moiz_partition_test/00 > > hadoop fs -copyFromLocal test.dat /user/moiztcs/moiz_partition_test/01 > > hadoop fs -copyFromLocal test.dat /user/moiztcs/moiz_partition_test/02 > > hadoop fs -copyFromLocal test.dat /user/moiztcs/moiz_partition_test/10 > > > > 4) Ran the sql > > hive> select distinct partition_hr from moiz_partition_test order by > partition_hr; > > Ended Job > > OK > > 0 > > 1 > > 10 > > 2 > > > > Thanks, > > Moiz > > *From:* Stephen Sprague [mailto:spragues@gmail.com] > *Sent:* Wednesday, March 12, 2014 12:55 AM > *To:* user@hive.apache.org > *Subject:* Re: Hive - Sorting on the Partition Column data type Int . > Output is Alphabetic Sort > > > > that makes no sense. if the column is an int it isn't going to sort like a > string. I smell a user error somewhere. > > > > On Tue, Mar 11, 2014 at 6:21 AM, Arafat, Moiz > wrote: > > Hi , > > I have a table that has a partition column partition_hr . Data Type is int > (partition_hr int) . When i run a sort on this column the output is > like this. > > 0 > 1 > 10 > 11 > 12 > 13 > 14 > 15 > 16 > 17 > 18 > 19 > 2 > 20 > 21 > 22 > 23 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > > I expected the output like this . > > 0 > 1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > . > . > and so on. > > It works fine for non-partition columns. Please advise. > > Thanks, > Moiz > > > > > --047d7bdca04c86b2ca04f4817659 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
wow. its still sorting based on string context.=A0 ok, some = followups.

1.=A0 did you start clean?=A0 ie. did you do a "drop table moiz_partit= ion_test" before you started?

2.=A0 lets see the output of = "show create table moiz_partition_test"=A0 (if that doesn't w= ork [its hive v0.11 i think] lets see "desc moiz_partition_test" = )


3.=A0 what version of hive are you running?


pret= ty bizarre.




On Thu, Mar 13, 2014 at 3:48 AM, Arafat, Moiz = <moiz.arafa= t@teamaol.com> wrote:

Hi Stephen,=

=A0<= /p>

I followed your approach = and still got the same result

=A0<= /p>

1) hive> CREATE TABLE = moiz_partition_test

=A0=A0=A0 > (EVENT_DT = STRING) partitioned by

=A0=A0=A0 > (

=A0=A0=A0 > PARTITION_= HR INT

=A0=A0=A0 > )

=A0=A0=A0 > ROW FORMAT= DELIMITED

=A0=A0=A0 > FIELDS TER= MINATED BY '09'

=A0=A0=A0 > location &= #39;/user/moiztcs/moiz_partition_test'

=A0=A0=A0 > ;

=A0<= /p>

2)

hive> alter tabl= e=A0 moiz_partition_test add IF NOT EXISTS partition (partition_hr=3D0)=A0 = ;

OK

Time taken: 2.421 seconds=

hive> alter table=A0 m= oiz_partition_test add IF NOT EXISTS partition (partition_hr=3D1)=A0 ;

OK

Time taken: 0.132 seconds=

hive> alter table=A0 m= oiz_partition_test add IF NOT EXISTS partition (partition_hr=3D2)=A0 ;

OK

Time taken: 0.226 seconds=

hive> alter table=A0 m= oiz_partition_test add IF NOT EXISTS partition (partition_hr=3D10)=A0 ;<= /u>

OK

Time taken: 0.177 seconds=

=A0<= /p>

3)

$ hadoop fs -ls /user/moi= ztcs/moiz_partition_test

Found 4 items

drwxr-xr-x=A0=A0 - cdidw = aolmis=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 2014-03-13 06:40 /user/moiztcs/moiz_par= tition_test/partition_hr=3D0

drwxr-xr-x=A0=A0 - cdidw = aolmis=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 2014-03-13 06:41 /user/moiztcs/moiz_par= tition_test/partition_hr=3D1

drwxr-xr-x=A0=A0 - cdidw = aolmis=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 2014-03-13 06:42 /user/moiztcs/moiz_par= tition_test/partition_hr=3D10

drwxr-xr-x=A0=A0 - cdidw = aolmis=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 2014-03-13 06:41 /user/moiztcs/moiz_par= tition_test/partition_hr=3D2

=A0<= /p>

4)

$ hadoop fs -copyFromLoca= l test.dat=A0 /user/moiztcs/moiz_partition_test/partition_hr=3D0<= /u>

$ hadoop fs -copyFromLoca= l test.dat=A0 /user/moiztcs/moiz_partition_test/partition_hr=3D1<= /u>

$ hadoop fs -copyFromLoca= l test.dat=A0 /user/moiztcs/moiz_partition_test/partition_hr=3D10=

$ hadoop fs -copyFromLoca= l test.dat=A0 /user/moiztcs/moiz_partition_test/partition_hr=3D2<= /u>

=A0<= /p>

5) hive> select distin= ct partition_hr from moiz_partition_test order by partition_hr;

=A0<= /p>

OK

0

1

10

2

=A0<= /p>

Thanks,

Moiz=

=A0<= /p>

From: = Stephen Sprague [mailto:spragues@gmail.com]
Sent: Wednesday, March 12, 2014 9:58 PM


To: user@h= ive.apache.org
Subject: Re: Hive - Sorting on the Partition Column data type Int . = Output is Alphabetic Sort

=A0

there you go.=A0=A0 I think you're infli= cting too much of your own will onto hive with specifying the partition dir= ectories as 00, 01, 02.=A0

In my experience hive expects the partition name followed by an equal sign = followed by the value.

I'd stick with this kind of hdfs topolog= y:

/user/moiztcs/moiz_partit= ion_test/partition_hr=3D00/
/user/moiztcs/moiz_partition_test/partition_hr=3D01/

/user/moiztcs/moiz_partition_test/partiti= on_hr=3D10/

By omitting the location clause on your alter table statements you should g= et above layout which can be

confirmed by issuing the following command:<= u>

= $ hdfs dfs -ls /user/moiztc/moiz_partition_test

Can you try this?

=A0

=A0

= =A0

=A0

On Wed, Mar 12, 2014 at 12:10 AM, Arafat, Moiz <<= a href=3D"mailto:moiz.arafat@teamaol.com" target=3D"_blank">moiz.arafat@tea= maol.com> wrote:

Hi,<= /p>

=A0<= /p>

Here are the steps I foll= owed . Please let me know If I did something wrong.

=A0<= /p>

1)=A0=A0=A0=A0=A0 Create table

hive> CREATE TABLE moiz_partition_test

=A0=A0=A0 > (EVENT_DT STRING) partitioned = by

=A0=A0=A0 > (

=A0=A0=A0 > PARTITION_HR INT=

=A0 =A0=A0> )

=A0=A0=A0 >=A0 ROW FORMAT DELIMITED=

=A0=A0=A0 >=A0 FIELDS TERMINATED BY '0= 9'

=A0=A0=A0 >=A0 location '/user/moiztcs= /moiz_partition_test'

=A0=A0=A0 > ;

=A0

2)=A0=A0=A0=A0=A0 Add partitions

hive> alter table=A0 moiz_partition_test add = IF NOT EXISTS partition (partition_hr=3D0)=A0 location '/user/moiztcs/m= oiz_partition_test/00';

OK

Time taken: 0.411 seconds

hive> alter table=A0 moiz_partition_test add = IF NOT EXISTS partition (partition_hr=3D1)=A0 location '/user/moiztcs/m= oiz_partition_test/01';

OK

Time taken: 0.193 seconds

hive> alter table=A0 moiz_partition_test add = IF NOT EXISTS partition (partition_hr=3D2)=A0 location '/user/moiztcs/m= oiz_partition_test/02';

OK

Time taken: 0.182 seconds

hive> alter table=A0 moiz_partition_test add = IF NOT EXISTS partition (partition_hr=3D10)=A0 location '/user/moiztcs/= moiz_partition_test/10';

OK

Time taken: 0.235 seconds

=A0

3)=A0=A0=A0=A0=A0 Copy data into the directories<= /u>

hadoop fs -copyFromLocal test.dat=A0 /user/mo= iztcs/moiz_partition_test/00

hadoop fs -copyFromLocal test.dat=A0 /user/mo= iztcs/moiz_partition_test/01

hadoop fs -copyFromLocal test.dat=A0 /user/mo= iztcs/moiz_partition_test/02

hadoop fs -copyFromLocal test.dat=A0 /user/mo= iztcs/moiz_partition_test/10

=A0

4)=A0=A0=A0=A0=A0 Ran the sql

hive> select distinct partition_hr from mo= iz_partition_test order by partition_hr;

Ended Job

OK

0

1

10

2

=A0<= /p>

Thanks,<= /u>

Moiz=

From: Stephe= n Sprague [mailto:s= pragues@gmail.com]
Sent: Wednesday, March 12, 2014 12:55 AM
To: user@h= ive.apache.org
Subject: Re: Hive - Sorting on the Partition Column data type Int . = Output is Alphabetic Sort

=A0

= that makes no sense. if the column is an int it isn't going to sort lik= e a string.=A0 I smell a user error somewhere.

=A0

On Tue, Mar 11, 2014 at 6:21 AM, Arafat, Moiz <moiz.arafat@team= aol.com> wrote:

Hi ,

I have a table that has a partition column partition_hr . Data Type is int = (partition_hr =A0 =A0 =A0 =A0int) . When i run a sort on this column the ou= tput is like this.

0
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
23
3
4
5
6
7
8
9

I expected the output like this =A0.

0
1
2
3
4
5
6
7
8
9
10
.
.
and so on.

It works fine for non-partition columns. Please advise.

Thanks,
Moiz

=A0

=A0


--047d7bdca04c86b2ca04f4817659--