Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BA35410BD7 for ; Fri, 17 Jan 2014 10:06:45 +0000 (UTC) Received: (qmail 27776 invoked by uid 500); 17 Jan 2014 10:06:42 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 27605 invoked by uid 500); 17 Jan 2014 10:06:41 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 27594 invoked by uid 99); 17 Jan 2014 10:06:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jan 2014 10:06:39 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of leftyleverenz@gmail.com designates 209.85.214.50 as permitted sender) Received: from [209.85.214.50] (HELO mail-bk0-f50.google.com) (209.85.214.50) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jan 2014 10:06:33 +0000 Received: by mail-bk0-f50.google.com with SMTP id w16so940898bkz.9 for ; Fri, 17 Jan 2014 02:06:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=QTAb0+LgWVNhx1PkcXDR6q63YfpQicS3ax9xcH6Xa4k=; b=quOzC6OmSkyaWSyiKfTtP5mhAVVgnxNGck3xffF+0gKxJVigTpUGzXZGVqg2hrgAmA iiUAo/Wv2zxhWWOuYYeAiXwLk2YmX6SnAuetRXVjcibQH6BNcfPNRgr1sycdaO8uRuVV ZNmUOxIFmg7zUuTGEByFxcurL2emDkK3IwruRjeyUwYhne8rr9LAnJgDwjwtOnbN0Ix9 mv6UjqKN0Nbj8OS12mOGHS9frRKJI3+V17gInYWnDPWUCEoiEP/TnaX6wkP2uieUm3tR bwYAt8PlmXYLJeE+w7n4pE+e2YfYyHFX0TJ/1AbpP3R0V+A7RYyiWeZ/iR4D9ssSBjvd zeug== MIME-Version: 1.0 X-Received: by 10.204.177.142 with SMTP id bi14mr129692bkb.84.1389953172473; Fri, 17 Jan 2014 02:06:12 -0800 (PST) Received: by 10.205.47.72 with HTTP; Fri, 17 Jan 2014 02:06:12 -0800 (PST) In-Reply-To: <0A88212BF9DDF54FBEE61A2C1965291D03208E3A28@GSCMBLP12EX.firmwide.corp.gs.com> References: <0A88212BF9DDF54FBEE61A2C1965291D03208E3A1C@GSCMBLP12EX.firmwide.corp.gs.com> <0A88212BF9DDF54FBEE61A2C1965291D03208E3A28@GSCMBLP12EX.firmwide.corp.gs.com> Date: Fri, 17 Jan 2014 02:06:12 -0800 Message-ID: Subject: Re: complex datatypes filling From: Lefty Leverenz To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a11333c6619af4204f027b139 X-Virus-Checked: Checked by ClamAV on apache.org --001a11333c6619af4204f027b139 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Here's the wikidoc for transform: Transform/Map-Reduce Syntax . -- Lefty On Thu, Jan 16, 2014 at 10:44 PM, Bogala, Chandra Reddy < Chandra.Bogala@gs.com> wrote: > Thanks for quick reply. I will take a look at stream job and transform > functions. > > One more question: > > I have multiple csv files ( same structure, dir added as partition) mappe= d > to hive table. Then I run different group by jobs on same data like below= . > All these are spanned as different jobs. So multiple mappers read/fetch > data from disk and then computes different group/aggregation jobs. > > Each below job fetch same data from disk. Can this be avoided by reading > split only once and mapper computing different group by jobs in same mapp= er > itself. That may no of mappers will come down drastically and also mainly > multiple disk seeks for same data avoided. Do I need to write custom map > reduce job to do this? > > > > 1) Insert into temptable1 select TAG,col2,SUM(col5) as > SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts from > raw_data_by_epoch where ts=3D${hivevar:collectiontimestamp} group by > TAG,col2,TS > > > > 2) Insert into temptable2 select TAG,col2,col3,SUM(col5) as > SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts from > raw_data_by_epoch where ts=3D${hivevar:collectiontimestamp} group by > TAG,col2,col3,TS > > > > 3) Insert into temptable3 select TAG,col2,col3,col4,SUM(col5) as > SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts from > raw_data_by_epoch where ts=3D${hivevar:collectiontimestamp} group by > TAG,col2,col3,col4,TS > > > > Thanks, > > Chandra > > > > *From:* Stephen Sprague [mailto:spragues@gmail.com] > *Sent:* Friday, January 17, 2014 11:39 AM > *To:* user@hive.apache.org > *Subject:* Re: complex datatypes filling > > > > remember you can always setup a stream job to do any wild and crazy custo= m > thing you want. see the tranform() function documentation. Its really > quite easy. honest. > > > > On Thu, Jan 16, 2014 at 9:39 PM, Bogala, Chandra Reddy < > Chandra.Bogala@gs.com> wrote: > > Hi, > > I found lot of examples to map json data into hive complex data types > (map, array , struct etc). But I don=92t see anywhere filling complex dat= a > types with nested sql query ( I.e group by few columns(key) and array of > struct(multiple columns) containing result values ). > > So that it will be easy for me to map back into embedded/nested json > document. > > > > Thanks, > > Chandra > > > --001a11333c6619af4204f027b139 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
Here's the wikidoc for transform: =A0Transf= orm/Map-Reduce Syntax.

-- Lefty


On Thu, Jan 16, 2014 at 10:44 PM, Bogala= , Chandra Reddy <Chandra.Bogala@gs.com> wrote:

Thanks for quick reply. I will take a look a= t stream job and transform functions.

One more question:=

I have = multiple csv files ( same structure, dir added as partition) mapped to hive= table. Then I run different group by jobs on same data like below. All the= se are spanned as different jobs. So multiple mappers read/fetch data from = disk and then computes different group/aggregation jobs.

Each below job fetch same= data from disk. Can this be avoided by reading split only once and mapper = computing different group by jobs in same mapper itself. That may no of map= pers will come down drastically and also mainly multiple disk seeks for sam= e data avoided. Do I need to write custom map reduce job to do this?

=A0<= /p>

1)=A0=A0=A0=A0=A0 Insert into temptable1 select TAG,col2,SUM(col5) as= SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts=A0 from raw_data_b= y_epoch where ts=3D${hivevar:collectiontimestamp} group by TAG,col2,TS

=A0<= /p>

2)=A0=A0=A0=A0=A0 Insert into temptable2 select TAG,col2,col3,SUM(col= 5) as SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts=A0 from raw_d= ata_by_epoch where ts=3D${hivevar:collectiontimestamp} group by TAG,col2,co= l3,TS

=A0<= /p>

3)=A0=A0=A0=A0=A0 Insert into temptable3 select TAG,col2,col3,col4,SU= M(col5) as SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts=A0 from = raw_data_by_epoch where ts=3D${hivevar:collectiontimestamp} group by TAG,co= l2,col3,col4,TS

=A0

Thanks,

Chandra

=A0<= /span>

From: Stephen = Sprague [mailto:spr= agues@gmail.com]
Sent: Friday, January 17, 2014 11:39 AM
To: user@hive.apache.org
= Subject: Re: complex datatypes filling

=A0

remember you can alwa= ys setup a stream job to do any wild and crazy custom thing you want. see t= he tranform() function documentation.=A0 Its really quite easy. honest. =

<= /u>=A0

On Thu, Jan 16, 2014 at 9:39 P= M, Bogala, Chandra Reddy <Chandra.Bogala@gs.com> wrote:

Hi,

=A0 I found lot of examples to map json data into hive complex data types= (map, array , struct etc). But I don=92t see anywhere filling complex data= types with nested sql =A0query ( I.e group by few columns(key) and array o= f struct(multiple columns) containing =A0result values ).

So that it will be easy for me to map back into embe= dded/nested json document.

=A0

Thanks,

Chandra

= =A0


--001a11333c6619af4204f027b139--