Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (nike.apache.org: domain of leftyleverenz@gmail.com
 designates 209.85.214.50 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <0A88212BF9DDF54FBEE61A2C1965291D03208E3A28@GSCMBLP12EX.firmwide.corp.gs.com>
References: 
 <0A88212BF9DDF54FBEE61A2C1965291D03208E3A1C@GSCMBLP12EX.firmwide.corp.gs.com>
	<CAC06LGbQUg2qUNFCzyZg9QTf6VyoeW-fjXCxLCOb2SXSfESHAQ@mail.gmail.com>
	<0A88212BF9DDF54FBEE61A2C1965291D03208E3A28@GSCMBLP12EX.firmwide.corp.gs.com>
Date: Fri, 17 Jan 2014 02:06:12 -0800
Message-ID: 
 <CACEcauVhUT9Mqk3rREHd4F3NoeKss5Zqso7aSQmtqw1Bj5rNgA@mail.gmail.com>
Subject: Re: complex datatypes filling
From: Lefty Leverenz <leftyleverenz@gmail.com>
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=001a11333c6619af4204f027b139

--001a11333c6619af4204f027b139
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Here's the wikidoc for transform:  Transform/Map-Reduce
Syntax<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Tran=
sform>
.

-- Lefty


On Thu, Jan 16, 2014 at 10:44 PM, Bogala, Chandra Reddy <
Chandra.Bogala@gs.com> wrote:

> Thanks for quick reply. I will take a look at stream job and transform
> functions.
>
> One more question:
>
> I have multiple csv files ( same structure, dir added as partition) mappe=
d
> to hive table. Then I run different group by jobs on same data like below=
.
> All these are spanned as different jobs. So multiple mappers read/fetch
> data from disk and then computes different group/aggregation jobs.
>
> Each below job fetch same data from disk. Can this be avoided by reading
> split only once and mapper computing different group by jobs in same mapp=
er
> itself. That may no of mappers will come down drastically and also mainly
> multiple disk seeks for same data avoided. Do I need to write custom map
> reduce job to do this?
>
>
>
> 1)      Insert into temptable1 select TAG,col2,SUM(col5) as
> SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts  from
> raw_data_by_epoch where ts=3D${hivevar:collectiontimestamp} group by
> TAG,col2,TS
>
>
>
> 2)      Insert into temptable2 select TAG,col2,col3,SUM(col5) as
> SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts  from
> raw_data_by_epoch where ts=3D${hivevar:collectiontimestamp} group by
> TAG,col2,col3,TS
>
>
>
> 3)      Insert into temptable3 select TAG,col2,col3,col4,SUM(col5) as
> SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts  from
> raw_data_by_epoch where ts=3D${hivevar:collectiontimestamp} group by
> TAG,col2,col3,col4,TS
>
>
>
> Thanks,
>
> Chandra
>
>
>
> *From:* Stephen Sprague [mailto:spragues@gmail.com]
> *Sent:* Friday, January 17, 2014 11:39 AM
> *To:* user@hive.apache.org
> *Subject:* Re: complex datatypes filling
>
>
>
> remember you can always setup a stream job to do any wild and crazy custo=
m
> thing you want. see the tranform() function documentation.  Its really
> quite easy. honest.
>
>
>
> On Thu, Jan 16, 2014 at 9:39 PM, Bogala, Chandra Reddy <
> Chandra.Bogala@gs.com> wrote:
>
> Hi,
>
>   I found lot of examples to map json data into hive complex data types
> (map, array , struct etc). But I don=92t see anywhere filling complex dat=
a
> types with nested sql  query ( I.e group by few columns(key) and array of
> struct(multiple columns) containing  result values ).
>
> So that it will be easy for me to map back into embedded/nested json
> document.
>
>
>
> Thanks,
>
> Chandra
>
>
>

--001a11333c6619af4204f027b139
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Here&#39;s the wikidoc for transform: =A0<a href=3D"https:=
//cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform">Transf=
orm/Map-Reduce Syntax</a>.</div><div class=3D"gmail_extra"><br clear=3D"all=
"><div>
<div dir=3D"ltr">-- Lefty</div></div>
<br><br><div class=3D"gmail_quote">On Thu, Jan 16, 2014 at 10:44 PM, Bogala=
, Chandra Reddy <span dir=3D"ltr">&lt;<a href=3D"mailto:Chandra.Bogala@gs.c=
om" target=3D"_blank">Chandra.Bogala@gs.com</a>&gt;</span> wrote:<br><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex">
<div lang=3D"EN-US" link=3D"blue" vlink=3D"purple"><div><p class=3D"MsoNorm=
al"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;s=
ans-serif&quot;;color:#1f497d">Thanks for quick reply. I will take a look a=
t stream job and transform functions. <u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">One more question:<u></u>=
<u></u></span></p><p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;fo=
nt-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">I have =
multiple csv files ( same structure, dir added as partition) mapped to hive=
 table. Then I run different group by jobs on same data like below. All the=
se are spanned as different jobs. So multiple mappers read/fetch data from =
disk and then computes different group/aggregation jobs. <u></u><u></u></sp=
an></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">Each below job fetch same=
 data from disk. Can this be avoided by reading split only once and mapper =
computing different group by jobs in same mapper itself. That may no of map=
pers will come down drastically and also mainly multiple disk seeks for sam=
e data avoided. Do I need to write custom map reduce job to do this? <u></u=
><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p><p><u></u><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot=
;,&quot;sans-serif&quot;;color:#1f497d"><span>1)<span style=3D"font:7.0pt &=
quot;Times New Roman&quot;">=A0=A0=A0=A0=A0 </span></span></span><u></u><sp=
an style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-ser=
if&quot;;color:#1f497d">Insert into temptable1 select TAG,col2,SUM(col5) as=
 SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts=A0 from raw_data_b=
y_epoch where ts=3D${hivevar:collectiontimestamp} group by TAG,col2,TS<u></=
u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p><p><u></u><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot=
;,&quot;sans-serif&quot;;color:#1f497d"><span>2)<span style=3D"font:7.0pt &=
quot;Times New Roman&quot;">=A0=A0=A0=A0=A0 </span></span></span><u></u><sp=
an style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-ser=
if&quot;;color:#1f497d">Insert into temptable2 select TAG,col2,col3,SUM(col=
5) as SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts=A0 from raw_d=
ata_by_epoch where ts=3D${hivevar:collectiontimestamp} group by TAG,col2,co=
l3,TS<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p><p><u></u><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot=
;,&quot;sans-serif&quot;;color:#1f497d"><span>3)<span style=3D"font:7.0pt &=
quot;Times New Roman&quot;">=A0=A0=A0=A0=A0 </span></span></span><u></u><sp=
an style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-ser=
if&quot;;color:#1f497d">Insert into temptable3 select TAG,col2,col3,col4,SU=
M(col5) as SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts=A0 from =
raw_data_by_epoch where ts=3D${hivevar:collectiontimestamp} group by TAG,co=
l2,col3,col4,TS<u></u><u></u></span></p>
<p><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sa=
ns-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span></p><p class=3D"MsoNo=
rmal"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot=
;sans-serif&quot;;color:#1f497d">Thanks,<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">Chandra<u></u><u></u></sp=
an></p><p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&=
quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u><=
/span></p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Stephen =
Sprague [mailto:<a href=3D"mailto:spragues@gmail.com" target=3D"_blank">spr=
agues@gmail.com</a>] <br>
<b>Sent:</b> Friday, January 17, 2014 11:39 AM<br><b>To:</b> <a href=3D"mai=
lto:user@hive.apache.org" target=3D"_blank">user@hive.apache.org</a><br><b>=
Subject:</b> Re: complex datatypes filling<u></u><u></u></span></p><div><di=
v class=3D"h5">
<p class=3D"MsoNormal"><u></u>=A0<u></u></p><div><div><p class=3D"MsoNormal=
"><span style=3D"font-family:&quot;Courier New&quot;">remember you can alwa=
ys setup a stream job to do any wild and crazy custom thing you want. see t=
he tranform() function documentation.=A0 Its really quite easy. honest. <u>=
</u><u></u></span></p>
</div></div><div><p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt"><u><=
/u>=A0<u></u></p><div><p class=3D"MsoNormal">On Thu, Jan 16, 2014 at 9:39 P=
M, Bogala, Chandra Reddy &lt;<a href=3D"mailto:Chandra.Bogala@gs.com" targe=
t=3D"_blank">Chandra.Bogala@gs.com</a>&gt; wrote:<u></u><u></u></p>
<div><div><p class=3D"MsoNormal">Hi,<u></u><u></u></p><p class=3D"MsoNormal=
">=A0 I found lot of examples to map json data into hive complex data types=
 (map, array , struct etc). But I don=92t see anywhere filling complex data=
 types with nested sql =A0query ( I.e group by few columns(key) and array o=
f struct(multiple columns) containing =A0result values ). <u></u><u></u></p=
>
<p class=3D"MsoNormal">So that it will be easy for me to map back into embe=
dded/nested json document.<u></u><u></u></p><p class=3D"MsoNormal">=A0<u></=
u><u></u></p><p class=3D"MsoNormal">Thanks,<u></u><u></u></p><p class=3D"Ms=
oNormal">
Chandra<u></u><u></u></p></div></div></div><p class=3D"MsoNormal"><u></u>=
=A0<u></u></p></div></div></div></div></div></blockquote></div><br></div>

--001a11333c6619af4204f027b139--