hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy...@yahoo.com>
Subject Re: How to support SQL NOT IN function in Hive QL
Date Tue, 08 Mar 2011 11:53:12 GMT
Thanks Rekha for such a quick response. A few more doubts out here
If I use the comparison operators on the dates directly would they give a 
desired result ?,as the dates are stored in Hive tables as String
Also in the comparison of dates if we use the unix_timestamp() it would consider 
the time stamp as well along with date for comparison right?

unix_timestamp(field4) >= unix_timestamp(RJC.START_DATE) AND 
unix_timestamp(field4) <= unix_timestamp(RJC.END_DATE) 


Regards
Bejoy.K.S





________________________________
From: Rekha Joshi <rekhajos@yahoo-inc.com>
To: "user@hive.apache.org" <user@hive.apache.org>
Sent: Tue, March 8, 2011 4:36:08 PM
Subject: Re: How to support SQL NOT IN function in Hive QL

 Re: How to support SQL NOT IN function in Hive QL Bejoy – If timestamp of the 
dates is not of much importance to you, you can alternatively use >= 
‘start_date’ and <= ‘end_date’.
Thanks,
Rekha.


On 3/8/11 4:27 PM, "Bejoy Ks" <bejoy_ks@yahoo.com> wrote:


Thanks Rekha. I went with your first option 'LEFT OUTER JOIN' and it worked like 
a charm.
>The second one was not fitting for my case as it was popping out parse errors 
>due to multiple columns separated by comma coming under the same NOT IN clause 
> {(field1,field2,field3,field4 )NOT IN (SELECT field1,field2,field3,field4 ...}
>Guess hive doesn't support queries that way
>Now there is minor hurdle still, The SQL BETWEEN. How can we get this BETWEEN 
>AND supported in Hive QL?
>Any thoughts?
>
>My new transformed query would look like this
>INSERT OVERWRITE TABLE Table1 
>SELECT T2.field1,T2.field2,T2.field3,T2.field4 
>FROM Table2 T2 JOIN Table3 RJC ON (RJC.field3 = 'xyz') LEFT OUTER JOIN Table4 T4 
>ON 
>
>(T2.field1 = T4.field1 AND T2.field2 = T4.field2 AND T2.field3 = T4.field3 AND 
>T2.field4 = T4.field4)
>WHERE field4 BETWEEN RJC.START_DATE AND RJC.END_DATE
>GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;
>
>
>
>Regards
>Bejoy.K.S
>
>
>
>
________________________________
From:Rekha Joshi <rekhajos@yahoo-inc.com>
>To: "user@hive.apache.org" <user@hive.apache.org>
>Sent: Tue, March 8, 2011 3:08:08 PM
>Subject: Re: How to support SQL NOT IN function in Hive QL
>
>Re: How to support SQL NOT IN function in Hive QL @Bejoy – AFAIK, NOT IN is not 
>directly supported in current hive. Workaround, you can write an outer join 
>instead of antijoin. HIVE-1740, mentions another workaround - NOT(x LIKE p).
>Thanks,
>Rekha.
>
>On 3/8/11 2:12 PM, "Bejoy Ks" <bejoy_ks@yahoo.com> wrote:
>
>
>Hi Experts
>>    I'm facing a hurdle in transforming a SQL query to equivalent Hive QL with 
>>SQL NOT IN functionality. My SQL query would like this
>>
>>INSERT INTO Table1
>>(T1_field1,T1_field2_ID,T1_field3,T1_field4)
>>SELECT field1,field2,field3,field4 
>>FROM Table2 JOIN Table3 T3
>>WHERE (field1,field2,field3,field4 ) NOT IN
>>(SELECT field1,field2,field3,field4  FROM Table4)
>>AND field4 BETWEEN T3.START_DATE  AND T3.END_DATE
>>AND T3.field3 = 'xyz' GROUP BY T1_field1,T1_field2_ID,T1_field3,T1_field4;
>>
>>
>>I'm comfortable with the rest of the query apart from the NOT IN part. I'm using 
>>Hive 0.7 with CDH3B4. I check the availability of SQL IN functionality in my 
>>hive and it worked like a charm but NOT IN doesn't work out here. Has it been 
>>already included as a new patch in latest trunk?
>>Can some one help me out how i can re frame the query avoiding the NOT IN part 
>>to get it working on Hive?(I'm not really from a db background)
>>Also is there any JIRA tickets open to have this functionality supported in 
>>upcoming versions of hive?
>>
>>Regards
>>Bejoy.K.S
>>
>>
>> 
>>


      
Mime
View raw message