Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@storm.incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of amocanu@verticalscope.com
 designates 207.46.163.212 as permitted sender)
From: Adrian Mocanu <amocanu@verticalscope.com>
To: "user@storm.incubator.apache.org" <user@storm.incubator.apache.org>
Subject: RE: aggregation in Trident
Thread-Topic: aggregation in Trident
Thread-Index: Ac8kKZrUGd3g2oNfSo+BXpmn0bXgigABL2+AAAMe5VA=
Date: Fri, 7 Feb 2014 19:36:43 +0000
Message-ID: 
 <e135e01a658d4cc2a262995294541a2f@CO2PR07MB522.namprd07.prod.outlook.com>
References: 
 <e1379ca116974cc28d3542cc2f583ad3@CO2PR07MB522.namprd07.prod.outlook.com>
 <CAD9ohx_oqgeOCeEmxUv5UDMfqS=oiv=6Fe2CV5co+hixSCDxGA@mail.gmail.com>
In-Reply-To: 
 <CAD9ohx_oqgeOCeEmxUv5UDMfqS=oiv=6Fe2CV5co+hixSCDxGA@mail.gmail.com>
Accept-Language: en-CA, en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_e135e01a658d4cc2a262995294541a2fCO2PR07MB522namprd07pro_"
MIME-Version: 1.0

--_000_e135e01a658d4cc2a262995294541a2fCO2PR07MB522namprd07pro_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi Adam,

Thanks for your reply. Very helpful!

Follow up on Q2:
Q2.1
So if I do a .groupBy(new Fields("name")) then I use a count aggregator and=
 I have 3 tuples with the same name:
("name"," value1","field3")
("name"," value2","field3")
("name"," value3","field3")
the output result tuple of the aggregation would be ("name","count"). Corre=
ct?

Q2.2
In my stream, before I do  this counting, I do a groupBy(new Fields("field3=
")).each( .. ) then can I do a groupBy again .groupBy(new Fields("name")) ?
If so, would Count() take the last groupBy's parameter, name in this case, =
or would it take previous groupBy's params combined: field3, and name?
I have a feeling that it takes the last one only. Correct?


Thanks again. This is great info.
-A
From: supercargo@gmail.com [mailto:supercargo@gmail.com] On Behalf Of Adam =
Lewis
Sent: February-07-14 12:59 PM
To: user
Subject: Re: aggregation in Trident

Hi Adrian,

Q1: Count and Sum are different just as in a relational DB.  Count will jus=
t count the number of tuples, while Sum will sum up the values in the field=
 you specify.  So in your example, if you had three tuples with field "b" [=
[1],[2],[3]] then count would be 3 and sum would be 6.  Of course, if b is =
always 1, then they are the same.  Also, note, that you are asking for the =
aggregate only within the partition (see Q2)

Q2: you can specify a .groupBy(new Fields("name")) to get a different aggre=
gation for each unique value of name.  Again, very similar to SQL group by,=
 you will preserve any fields which you group by and aggregate the other fi=
elds into new fields.

Take a look at the trident reach and word count tutorials to see these conc=
epts in action https://github.com/nathanmarz/storm/wiki/Trident-tutorial

Adam

On Fri, Feb 7, 2014 at 12:36 PM, Adrian Mocanu <amocanu@verticalscope.com<m=
ailto:amocanu@verticalscope.com>> wrote:
Hi group

Q1: What is the difference between Sum() and Count() as aggregators? I thou=
ght they meant the same thing ie: you count to get the sum.
https://github.com/nathanmarz/storm/wiki/Trident-API-Overview#partitionaggr=
egate gives this example where both are emitted:
mystream.chainedAgg()
        .partitionAggregate(new Count(), new Fields("count"))
        .partitionAggregate(new Fields("b"), new Sum(), new Fields("sum"))
        .chainEnd()

Q2:
If  you have a tuple with 3 fields like ("name","value","field3") and want =
to count how many tuples with the same name you get I can easily use a Coun=
t() or Sum() (are they interchangeable?- see Q1). Problem is after aggregat=
ion I get only the sum and not the other fields like "name" and "field3"
Maybe Trident API wiki page can be updated with such an example

Thanks
-A


--_000_e135e01a658d4cc2a262995294541a2fCO2PR07MB522namprd07pro_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:"Malgun Gothic";
	panose-1:2 11 5 3 2 0 0 2 0 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:"\@Malgun Gothic";
	panose-1:2 11 5 3 2 0 0 2 0 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.hoenzb
	{mso-style-name:hoenzb;}
span.EmailStyle18
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri","sans-serif";}
@page WordSection1
	{size:612.0pt 792.0pt;
	margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-CA" link=3D"blue" vlink=3D"purple">
<div class=3D"WordSection1">
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Hi Adam,<o:p></o:p></span=
></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Thanks for your reply. Ve=
ry helpful!<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Follow up on Q2:<o:p></o:=
p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Q2.1<o:p></o:p></span></p=
>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">So if I do a .groupBy(new=
 Fields(&quot;name&quot;)) then I use a count aggregator and I have 3 tuple=
s with the same name:<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">(&#8220;name&#8221;,&#822=
1; value1&#8221;,&#8221;field3&#8221;)<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">(&#8220;name&#8221;,&#822=
1; value2&#8221;,&#8221;field3&#8221;)<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">(&#8220;name&#8221;,&#822=
1; value3&#8221;,&#8221;field3&#8221;)<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">the output result tuple o=
f the aggregation would be (&#8220;name&#8221;,&#8221;count&#8221;). Correc=
t?<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Q2.2<o:p></o:p></span></p=
>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">In my stream, before I do=
&nbsp; this counting, I do a groupBy(new Fields(&#8220;field3&#8221;)).each=
( .. ) then can I do a groupBy again .groupBy(new Fields(&quot;name&quot;))=
 ?
<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">If so, would Count() take=
 the last groupBy&#8217;s parameter, name in this case, or would it take pr=
evious groupBy&#8217;s params combined: field3, and name?<o:p></o:p></span>=
</p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">I have a feeling that it =
takes the last one only. Correct?<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Thanks again. This is gre=
at info.<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">-A<o:p></o:p></span></p>
<p class=3D"MsoNormal"><b><span lang=3D"EN-US" style=3D"font-size:11.0pt;fo=
nt-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">From:</span></b><span=
 lang=3D"EN-US" style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&=
quot;sans-serif&quot;"> supercargo@gmail.com [mailto:supercargo@gmail.com]
<b>On Behalf Of </b>Adam Lewis<br>
<b>Sent:</b> February-07-14 12:59 PM<br>
<b>To:</b> user<br>
<b>Subject:</b> Re: aggregation in Trident<o:p></o:p></span></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-family:&quot;Arial&quot;,&quot;s=
ans-serif&quot;">Hi Adrian,<o:p></o:p></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-family:&quot;Arial&quot;,&quot;s=
ans-serif&quot;"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-family:&quot;Arial&quot;,&quot;s=
ans-serif&quot;">Q1: Count and Sum are different just as in a relational DB=
. &nbsp;Count will just count the number of tuples, while Sum will sum up t=
he values in the field you specify. &nbsp;So in your example, if you
 had three tuples with field &quot;b&quot; [[1],[2],[3]] then count would b=
e 3 and sum would be 6. &nbsp;Of course, if b is always 1, then they are th=
e same. &nbsp;Also, note, that you are asking for the aggregate only within=
 the partition (see Q2)<o:p></o:p></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-family:&quot;Arial&quot;,&quot;s=
ans-serif&quot;"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-family:&quot;Arial&quot;,&quot;s=
ans-serif&quot;">Q2: you can specify a .groupBy(new Fields(&quot;name&quot;=
)) to get a different aggregation for each unique value of name. &nbsp;Agai=
n, very similar to SQL group by, you will preserve any fields which you
 group by and aggregate the other fields into new fields.<o:p></o:p></span>=
</p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-family:&quot;Arial&quot;,&quot;s=
ans-serif&quot;"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-family:&quot;Arial&quot;,&quot;s=
ans-serif&quot;">Take a look at the trident reach and word count tutorials =
to see these concepts in action&nbsp;<a href=3D"https://github.com/nathanma=
rz/storm/wiki/Trident-tutorial">https://github.com/nathanmarz/storm/wiki/Tr=
ident-tutorial</a><o:p></o:p></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-family:&quot;Arial&quot;,&quot;s=
ans-serif&quot;"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-family:&quot;Arial&quot;,&quot;s=
ans-serif&quot;">Adam<o:p></o:p></span></p>
</div>
</div>
<div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt"><o:p>&nbsp;</o:p></p>
<div>
<p class=3D"MsoNormal">On Fri, Feb 7, 2014 at 12:36 PM, Adrian Mocanu &lt;<=
a href=3D"mailto:amocanu@verticalscope.com" target=3D"_blank">amocanu@verti=
calscope.com</a>&gt; wrote:<o:p></o:p></p>
<blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0c=
m 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">Hi group<o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">&nbsp;<o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">Q1: What is the difference between Sum() and Count() as aggregator=
s? I thought they meant the same thing ie: you count to get the sum.<o:p></=
o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><a href=3D"https://github.com/nathanmarz/storm/wiki/Trident-API-Ov=
erview#partitionaggregate" target=3D"_blank">https://github.com/nathanmarz/=
storm/wiki/Trident-API-Overview#partitionaggregate</a>
 gives this example where both are emitted: <o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><span style=3D"font-size:10.0pt;font-family:&quot;Courier New&quot=
;">mystream.chainedAgg()</span><o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><span style=3D"font-size:10.0pt;font-family:&quot;Courier New&quot=
;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .partitionAggregate(new Count=
(), new Fields(&quot;count&quot;))</span><o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><span style=3D"font-size:10.0pt;font-family:&quot;Courier New&quot=
;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .partitionAggregate(new Field=
s(&quot;b&quot;), new Sum(), new Fields(&quot;sum&quot;))</span><o:p></o:p>=
</p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><span style=3D"font-size:10.0pt;font-family:&quot;Courier New&quot=
;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .chainEnd()</span><o:p></o:p>=
</p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">&nbsp;<o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">Q2:
<o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">If &nbsp;you have a tuple with 3 fields like (&#8220;name&#8221;,&=
#8221;value&#8221;,&#8221;field3&#8221;) and want to count how many tuples =
with the same name you get I can easily use a Count() or Sum() (are they in=
terchangeable?-
 see Q1). Problem is after aggregation I get only the sum and not the other=
 fields like &#8220;name&#8221; and &#8220;field3&#8221;<o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">Maybe Trident API wiki page can be updated with such an example<o:=
p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">&nbsp;<o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">Thanks<o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><span style=3D"color:#888888">-A<o:p></o:p></span></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><span style=3D"color:#888888">&nbsp;<o:p></o:p></span></p>
</div>
</div>
</blockquote>
</div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
</div>
</body>
</html>

--_000_e135e01a658d4cc2a262995294541a2fCO2PR07MB522namprd07pro_--