Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E774D105EF for ; Fri, 7 Feb 2014 19:37:30 +0000 (UTC) Received: (qmail 97609 invoked by uid 500); 7 Feb 2014 19:37:29 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 97578 invoked by uid 500); 7 Feb 2014 19:37:28 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 97570 invoked by uid 99); 7 Feb 2014 19:37:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Feb 2014 19:37:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of amocanu@verticalscope.com designates 207.46.163.212 as permitted sender) Received: from [207.46.163.212] (HELO na01-bl2-obe.outbound.protection.outlook.com) (207.46.163.212) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Feb 2014 19:37:24 +0000 Received: from CO2PR07MB522.namprd07.prod.outlook.com (10.141.198.144) by CO2PR07MB522.namprd07.prod.outlook.com (10.141.198.144) with Microsoft SMTP Server (TLS) id 15.0.868.8; Fri, 7 Feb 2014 19:36:43 +0000 Received: from CO2PR07MB522.namprd07.prod.outlook.com ([10.141.198.144]) by CO2PR07MB522.namprd07.prod.outlook.com ([10.141.198.144]) with mapi id 15.00.0868.013; Fri, 7 Feb 2014 19:36:43 +0000 From: Adrian Mocanu To: "user@storm.incubator.apache.org" Subject: RE: aggregation in Trident Thread-Topic: aggregation in Trident Thread-Index: Ac8kKZrUGd3g2oNfSo+BXpmn0bXgigABL2+AAAMe5VA= Date: Fri, 7 Feb 2014 19:36:43 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-CA, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [74.213.184.33] x-forefront-prvs: 011579F31F x-forefront-antispam-report: SFV:NSPM;SFS:(10009001)(43784003)(24454002)(377454003)(189002)(199002)(19580405001)(15202345003)(81686001)(74706001)(83072002)(85852003)(94946001)(15975445006)(87936001)(53806001)(94316002)(76786001)(47736001)(93136001)(76576001)(56816005)(90146001)(86362001)(93516002)(83322001)(49866001)(65816001)(46102001)(47446002)(50986001)(33646001)(4396001)(74876001)(47976001)(74502001)(56776001)(69226001)(74316001)(59766001)(2656002)(87266001)(76796001)(66066001)(92566001)(31966008)(77982001)(54356001)(80022001)(63696002)(81816001)(74662001)(51856001)(95416001)(81342001)(81542001)(16236675002)(19300405004)(19580395003)(74366001)(76482001)(85306002)(79102001)(54316002)(80976001)(24736002);DIR:OUT;SFP:1101;SCL:1;SRVR:CO2PR07MB522;H:CO2PR07MB522.namprd07.prod.outlook.com;CLIP:74.213.184.33;FPR:EC5FF72D.2CF65FDD.F1FF35BA.C095FBA9.2037A;InfoNoRecordsMX:1;A:1;LANG:en; Content-Type: multipart/alternative; boundary="_000_e135e01a658d4cc2a262995294541a2fCO2PR07MB522namprd07pro_" MIME-Version: 1.0 X-OriginatorOrg: verticalscope.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_e135e01a658d4cc2a262995294541a2fCO2PR07MB522namprd07pro_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Adam, Thanks for your reply. Very helpful! Follow up on Q2: Q2.1 So if I do a .groupBy(new Fields("name")) then I use a count aggregator and= I have 3 tuples with the same name: ("name"," value1","field3") ("name"," value2","field3") ("name"," value3","field3") the output result tuple of the aggregation would be ("name","count"). Corre= ct? Q2.2 In my stream, before I do this counting, I do a groupBy(new Fields("field3= ")).each( .. ) then can I do a groupBy again .groupBy(new Fields("name")) ? If so, would Count() take the last groupBy's parameter, name in this case, = or would it take previous groupBy's params combined: field3, and name? I have a feeling that it takes the last one only. Correct? Thanks again. This is great info. -A From: supercargo@gmail.com [mailto:supercargo@gmail.com] On Behalf Of Adam = Lewis Sent: February-07-14 12:59 PM To: user Subject: Re: aggregation in Trident Hi Adrian, Q1: Count and Sum are different just as in a relational DB. Count will jus= t count the number of tuples, while Sum will sum up the values in the field= you specify. So in your example, if you had three tuples with field "b" [= [1],[2],[3]] then count would be 3 and sum would be 6. Of course, if b is = always 1, then they are the same. Also, note, that you are asking for the = aggregate only within the partition (see Q2) Q2: you can specify a .groupBy(new Fields("name")) to get a different aggre= gation for each unique value of name. Again, very similar to SQL group by,= you will preserve any fields which you group by and aggregate the other fi= elds into new fields. Take a look at the trident reach and word count tutorials to see these conc= epts in action https://github.com/nathanmarz/storm/wiki/Trident-tutorial Adam On Fri, Feb 7, 2014 at 12:36 PM, Adrian Mocanu > wrote: Hi group Q1: What is the difference between Sum() and Count() as aggregators? I thou= ght they meant the same thing ie: you count to get the sum. https://github.com/nathanmarz/storm/wiki/Trident-API-Overview#partitionaggr= egate gives this example where both are emitted: mystream.chainedAgg() .partitionAggregate(new Count(), new Fields("count")) .partitionAggregate(new Fields("b"), new Sum(), new Fields("sum")) .chainEnd() Q2: If you have a tuple with 3 fields like ("name","value","field3") and want = to count how many tuples with the same name you get I can easily use a Coun= t() or Sum() (are they interchangeable?- see Q1). Problem is after aggregat= ion I get only the sum and not the other fields like "name" and "field3" Maybe Trident API wiki page can be updated with such an example Thanks -A --_000_e135e01a658d4cc2a262995294541a2fCO2PR07MB522namprd07pro_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Adam,

 <= /p>

Thanks for your reply. Ve= ry helpful!

 <= /p>

Follow up on Q2:

Q2.1

So if I do a .groupBy(new= Fields("name")) then I use a count aggregator and I have 3 tuple= s with the same name:

(“name”,̶= 1; value1”,”field3”)

(“name”,̶= 1; value2”,”field3”)

(“name”,̶= 1; value3”,”field3”)

the output result tuple o= f the aggregation would be (“name”,”count”). Correc= t?

 <= /p>

Q2.2

In my stream, before I do=   this counting, I do a groupBy(new Fields(“field3”)).each= ( .. ) then can I do a groupBy again .groupBy(new Fields("name"))= ?

If so, would Count() take= the last groupBy’s parameter, name in this case, or would it take pr= evious groupBy’s params combined: field3, and name?=

I have a feeling that it = takes the last one only. Correct?

 <= /p>

 <= /p>

Thanks again. This is gre= at info.

-A

From: supercargo@gmail.com [mailto:supercargo@gmail.com] On Behalf Of Adam Lewis
Sent: February-07-14 12:59 PM
To: user
Subject: Re: aggregation in Trident

 

Hi Adrian,

 

Q1: Count and Sum are different just as in a relational DB= .  Count will just count the number of tuples, while Sum will sum up t= he values in the field you specify.  So in your example, if you had three tuples with field "b" [[1],[2],[3]] then count would b= e 3 and sum would be 6.  Of course, if b is always 1, then they are th= e same.  Also, note, that you are asking for the aggregate only within= the partition (see Q2)

 

Q2: you can specify a .groupBy(new Fields("name"= )) to get a different aggregation for each unique value of name.  Agai= n, very similar to SQL group by, you will preserve any fields which you group by and aggregate the other fields into new fields.=

 

Take a look at the trident reach and word count tutorials = to see these concepts in action https://github.com/nathanmarz/storm/wiki/Tr= ident-tutorial

 

Adam

 

On Fri, Feb 7, 2014 at 12:36 PM, Adrian Mocanu <<= a href=3D"mailto:amocanu@verticalscope.com" target=3D"_blank">amocanu@verti= calscope.com> wrote:

Hi group

 

Q1: What is the difference between Sum() and Count() as aggregator= s? I thought they meant the same thing ie: you count to get the sum.

https://github.com/nathanmarz/= storm/wiki/Trident-API-Overview#partitionaggregate gives this example where both are emitted:

mystream.chainedAgg()

        .partitionAggregate(new Count= (), new Fields("count"))

        .partitionAggregate(new Field= s("b"), new Sum(), new Fields("sum"))=

        .chainEnd()=

 

Q2:

If  you have a tuple with 3 fields like (“name”,&= #8221;value”,”field3”) and want to count how many tuples = with the same name you get I can easily use a Count() or Sum() (are they in= terchangeable?- see Q1). Problem is after aggregation I get only the sum and not the other= fields like “name” and “field3”

Maybe Trident API wiki page can be updated with such an example

 

Thanks

-A

 

 

--_000_e135e01a658d4cc2a262995294541a2fCO2PR07MB522namprd07pro_--