hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Spence <yogi.wan.ken...@gmail.com>
Subject Re: Hive Query Question
Date Mon, 13 Jun 2011 19:08:29 GMT
Praveen,
My apologies--I meant to suggest a streaming function because a UDF would
not be able to hold state either.  Look at the documentation for TRANSFORM (
http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform ).  Your
transformation script can be used to compare timestamps from successive
lines of your data.
Tim



On Sat, Jun 11, 2011 at 12:32 PM, Praveen <pk1uuu@gmail.com> wrote:

> Do you mean that my UDF would store the timestamp of the current row in a
> static field in the UDF's implementation, and when processing the next row,
> use that field to get the previous row's value ?
>
> Can anyone comment on whether that's safe, re: I'm not familiar with Hive
> internals ?
>
> Thanks,
>
> pk
>
> Sent from my iPhone
>
> On Jun 10, 2011, at 11:18 PM, Tim Spence <yogi.wan.kenobi@gmail.com>
> wrote:
>
> Praveen,
> This would be best accomplished with a UDF because Hive does not support
> cursors.
> Best of luck,
> Tim
>
>
>
>
> On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar < <pk1u.uu@gmail.com>
> pk1u.uu@gmail.com> wrote:
>
>> If I have table timestamps:
>>
>> hive> desc timestamps;
>>
>> OK
>> ts      bigint
>>
>>
>> hive> select ts from timestamps order by ts
>> OK
>>
>> 1
>> 2
>> 3
>> 4
>> 5
>> 6
>> 7
>> 8
>> 9
>> 10
>> 30
>> 32
>> 34
>> 36
>> 38
>> 40
>> 42
>> 44
>> 46
>> 48
>> 50
>> 70
>> 74
>> 78
>> 100
>> 105
>> 110
>> 115
>>
>> and I want to make groups of the values where splits between groups
>> occur where two time-consecutive entries have a difference greater
>> than 10.
>>
>> Eg, above, the splits would be such that the numbers would be grouped
>> into these ranges:
>>
>> 0-10
>> 30-50
>> 70-78
>> 100-115
>>
>> because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.
>>
>> I'd like the query to result in the following:
>>
>> hive> select ...
>>
>> 0       7
>> 0       9
>> 0       6
>> 0       3
>> 0       10
>> 0       1
>> 0       4
>> 0       5
>> 0       8
>> 0       2
>> 30      34
>> 30      44
>> 30      40
>> 30      38
>> 30      36
>> 30      32
>> 30      46
>> 30      42
>> 30      48
>> 30      50
>> 30      30
>> 70      74
>> 70      70
>> 70      78
>> 100     100
>> 100     105
>> 100     110
>> 100     115
>>
>> What is the most efficient hive query that will do this ? Thanks,
>>
>> pk
>>
>
>

Mime
View raw message