hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furcy Pin <furcy....@flaminem.com>
Subject Re: Correlated Subqueries Workaround in Hive!
Date Mon, 15 Sep 2014 14:29:54 GMT
Hi,

what you are trying to do looks very much like what the LAG windowing
function does.
If your version of Hive is 0.11 or higher, I suggest trying it.
The hive doc for windowing function is here (but is quite poor):
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

Fortunately, as it is the same syntax as standard SQL, you can find better
doc for it:
http://www.oracle-base.com/articles/misc/lag-lead-analytic-functions.php

Hope this helps,

Furcy



2014-09-15 16:12 GMT+02:00 Nitin Pawar <nitinpawar432@gmail.com>:

> Other way I can think at this is ..
>
> 1) ignore all -1 and create a tmp table
> 2) I see there are couple of time stamps
> 3) Oder the table by timestamp
> 4) from this tmp tabel create anothe tmp table which says FK MinStartTime
> MaxEndTime Location
> 5) Now this tmp table from step 4 join with ur raw data and put where
> clause with min and max times
>
> I hope this is not confusing
>
> On Mon, Sep 15, 2014 at 6:25 PM, Viral Parikh <viral.j.parikh@gmail.com>
> wrote:
>
>> thanks!
>>
>> is there any other way than writing python UDF etc.
>>
>> any way i can leverage hive joins to get this working?
>>
>> On Mon, Sep 15, 2014 at 6:56 AM, Sreenath <sreenaths1923@gmail.com>
>> wrote:
>>
>>> How about writing a python UDF that takes input line by line
>>> and it saves the previous lines location and can replace it with that
>>> if location turns out to be '-1'
>>>
>>> On 15 September 2014 17:01, Nitin Pawar <nitinpawar432@gmail.com> wrote:
>>>
>>>> have you taken a look at lag and lead functions ?
>>>>
>>>> On Mon, Sep 15, 2014 at 4:46 PM, Viral Parikh <viral.j.parikh@gmail.com
>>>> > wrote:
>>>>
>>>>> To Whomsoever It May Concern,
>>>>>
>>>>> I posted this question last week but still haven't heard from anyone;
>>>>> I'd appreciate any reply.
>>>>>
>>>>> I've got a table that contains a LocationId field. In some cases,
>>>>> where a record shares the same foreign key, the LocationId might come
>>>>> through as -1.
>>>>>
>>>>> What I want to do is in my select query is in the case of this
>>>>> happening, the previous location.
>>>>>
>>>>> Example data:
>>>>>
>>>>> Record  FK     StartTime               EndTime          Location1   
   110  2011/01/01 12.30        2011/01/01 6.10      4562       110  2011/01/01 3.40     
   2011/01/01 4.00       -13       110  2011/01/02 1.00         2011/01/02 8.00      8914
      110  2011/01/02 5.00         2011/01/02 6.00       -15       110  2011/01/02 6.10  
      2011/01/02 6.30       -1
>>>>>
>>>>> The -1 should come out as 456 for record 2, and 891 for record 4 and
5
>>>>>
>>>>> Can someone help me do this with Hive syntax?
>>>>>
>>>>> I can do it using SQL syntax (as below) but since Hive doesnt support
>>>>> correlated subqueries in select clauses and so I am unable to get it.
>>>>>
>>>>> SELECT  T1.record,
>>>>>         T1.fk,
>>>>>         T1.start_time,
>>>>>         T1.end_time,
>>>>>         CASE WHEN T1.location != -1 THEN Location
>>>>>         ELSE
>>>>>             (
>>>>>             SELECT  TOP (1)
>>>>>                     T2.location
>>>>>             FROM    #temp1 AS T2
>>>>>             WHERE   T2.record < T1.record
>>>>>             AND     T2.fk = T1.fk
>>>>>             AND     T2.location != -1
>>>>>             ORDER   BY T2.Record DESC
>>>>>             )
>>>>>         ENDFROM    #temp1 AS T1
>>>>>
>>>>> Thank you for your help in advance!
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>>
>>> --
>>> Sreenath S Kamath
>>> Bangalore
>>> Ph No:+91-9590989106
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Mime
View raw message