Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BF48465B9 for ; Mon, 13 Jun 2011 19:09:00 +0000 (UTC) Received: (qmail 25963 invoked by uid 500); 13 Jun 2011 19:09:00 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 25911 invoked by uid 500); 13 Jun 2011 19:08:59 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 25902 invoked by uid 99); 13 Jun 2011 19:08:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 19:08:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yogi.wan.kenobi@gmail.com designates 209.85.214.48 as permitted sender) Received: from [209.85.214.48] (HELO mail-bw0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 19:08:51 +0000 Received: by bwz8 with SMTP id 8so4753917bwz.35 for ; Mon, 13 Jun 2011 12:08:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=0nqsuP1J8Hl20QKz/6m3Ho5otAb004FlwabUwXoyklA=; b=Y33FDfJFB+uKULqrXMBUZr/JxW8IuicvchL7rihQNFuGdfCkFngXSX1ZufjCJ9U4Us vyLvn4TxSLR5WvfTgTmek25LZumyy0L6SdGQ/R/UxJgpr1oSPh/Z/qYyL9Mu3WGKOrGt ulUkqlAFNjLyHoFn0juW1XKXXTwOQQ93VY2yk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=hfN57rs5Aq0p7AYq6ykcMe9nuyNBODbGfmxMfgeQWw9x9c4NuzXA0hGKtEzlIqCDSH L8PC9lzgpkDIEd5fJauTOM+30e/D275TH7yW326jdTeCJ1hd6URFRPLpRpNiOo7Cpd6B OaatvUOPbjn/eVZ17BhLKSpfSIb6yAsVTcC4E= MIME-Version: 1.0 Received: by 10.204.232.4 with SMTP id js4mr4747107bkb.47.1307992109553; Mon, 13 Jun 2011 12:08:29 -0700 (PDT) Received: by 10.204.18.202 with HTTP; Mon, 13 Jun 2011 12:08:29 -0700 (PDT) In-Reply-To: <1DCBB46D-C7BB-4C7D-87DB-116CB1F378A8@gmail.com> References: <1DCBB46D-C7BB-4C7D-87DB-116CB1F378A8@gmail.com> Date: Mon, 13 Jun 2011 12:08:29 -0700 Message-ID: Subject: Re: Hive Query Question From: Tim Spence To: user@hive.apache.org Content-Type: multipart/alternative; boundary=485b3979d9b60f190504a59ca52f X-Virus-Checked: Checked by ClamAV on apache.org --485b3979d9b60f190504a59ca52f Content-Type: text/plain; charset=ISO-8859-1 Praveen, My apologies--I meant to suggest a streaming function because a UDF would not be able to hold state either. Look at the documentation for TRANSFORM ( http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform ). Your transformation script can be used to compare timestamps from successive lines of your data. Tim On Sat, Jun 11, 2011 at 12:32 PM, Praveen wrote: > Do you mean that my UDF would store the timestamp of the current row in a > static field in the UDF's implementation, and when processing the next row, > use that field to get the previous row's value ? > > Can anyone comment on whether that's safe, re: I'm not familiar with Hive > internals ? > > Thanks, > > pk > > Sent from my iPhone > > On Jun 10, 2011, at 11:18 PM, Tim Spence > wrote: > > Praveen, > This would be best accomplished with a UDF because Hive does not support > cursors. > Best of luck, > Tim > > > > > On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar < > pk1u.uu@gmail.com> wrote: > >> If I have table timestamps: >> >> hive> desc timestamps; >> >> OK >> ts bigint >> >> >> hive> select ts from timestamps order by ts >> OK >> >> 1 >> 2 >> 3 >> 4 >> 5 >> 6 >> 7 >> 8 >> 9 >> 10 >> 30 >> 32 >> 34 >> 36 >> 38 >> 40 >> 42 >> 44 >> 46 >> 48 >> 50 >> 70 >> 74 >> 78 >> 100 >> 105 >> 110 >> 115 >> >> and I want to make groups of the values where splits between groups >> occur where two time-consecutive entries have a difference greater >> than 10. >> >> Eg, above, the splits would be such that the numbers would be grouped >> into these ranges: >> >> 0-10 >> 30-50 >> 70-78 >> 100-115 >> >> because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10. >> >> I'd like the query to result in the following: >> >> hive> select ... >> >> 0 7 >> 0 9 >> 0 6 >> 0 3 >> 0 10 >> 0 1 >> 0 4 >> 0 5 >> 0 8 >> 0 2 >> 30 34 >> 30 44 >> 30 40 >> 30 38 >> 30 36 >> 30 32 >> 30 46 >> 30 42 >> 30 48 >> 30 50 >> 30 30 >> 70 74 >> 70 70 >> 70 78 >> 100 100 >> 100 105 >> 100 110 >> 100 115 >> >> What is the most efficient hive query that will do this ? Thanks, >> >> pk >> > > --485b3979d9b60f190504a59ca52f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Praveen,
My apologies--I meant to suggest a streaming function because a= UDF would not be able to hold state either.=A0 Look at the documentation f= or TRANSFORM ( http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform= ).=A0 Your transformation script can be used to compare timestamps from su= ccessive lines of your data.
Tim



On Sat, Jun 11, 2011 at 12:32= PM, Praveen <pk1u= uu@gmail.com> wrote:
Do you mean that my UDF would store the times= tamp of the current row in a static field in the UDF's implementation, = and when processing the next row, use that field to get the previous row= 9;s value ?

Can anyone comment on whether that's safe, re: I= 9;m not familiar with Hive internals ?

Thanks,

pk

Sent from my iPhone

On Jun 10, 2011, at 11:18 PM, Tim Spence <yogi.wan.kenobi= @gmail.com> wrote:

Praveen,
This would be best accomplished with a UDF because Hive does no= t support cursors.
Best of luck,
Tim




On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar <pk1u.uu@gmail.com> wrote:
If I have table timestamps:

hive> desc timestamps;

OK
ts =A0 =A0 =A0bigint


hive> select ts from timestamps order by ts
OK

1
2
3
4
5
6
7
8
9
10
30
32
34
36
38
40
42
44
46
48
50
70
74
78
100
105
110
115

and I want to make groups of the values where splits between groups
occur where two time-consecutive entries have a difference greater
than 10.

Eg, above, the splits would be such that the numbers would be grouped
into these ranges:

0-10
30-50
70-78
100-115

because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10.

I'd like the query to result in the following:

hive> select ...

0 =A0 =A0 =A0 7
0 =A0 =A0 =A0 9
0 =A0 =A0 =A0 6
0 =A0 =A0 =A0 3
0 =A0 =A0 =A0 10
0 =A0 =A0 =A0 1
0 =A0 =A0 =A0 4
0 =A0 =A0 =A0 5
0 =A0 =A0 =A0 8
0 =A0 =A0 =A0 2
30 =A0 =A0 =A034
30 =A0 =A0 =A044
30 =A0 =A0 =A040
30 =A0 =A0 =A038
30 =A0 =A0 =A036
30 =A0 =A0 =A032
30 =A0 =A0 =A046
30 =A0 =A0 =A042
30 =A0 =A0 =A048
30 =A0 =A0 =A050
30 =A0 =A0 =A030
70 =A0 =A0 =A074
70 =A0 =A0 =A070
70 =A0 =A0 =A078
100 =A0 =A0 100
100 =A0 =A0 105
100 =A0 =A0 110
100 =A0 =A0 115

What is the most efficient hive query that will do this ? Thanks,

pk


--485b3979d9b60f190504a59ca52f--