storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Mocanu <amoc...@verticalscope.com>
Subject RE: aggregation in Trident
Date Fri, 07 Feb 2014 19:36:43 GMT
Hi Adam,

Thanks for your reply. Very helpful!

Follow up on Q2:
Q2.1
So if I do a .groupBy(new Fields("name")) then I use a count aggregator and I have 3 tuples
with the same name:
("name"," value1","field3")
("name"," value2","field3")
("name"," value3","field3")
the output result tuple of the aggregation would be ("name","count"). Correct?

Q2.2
In my stream, before I do  this counting, I do a groupBy(new Fields("field3")).each( .. )
then can I do a groupBy again .groupBy(new Fields("name")) ?
If so, would Count() take the last groupBy's parameter, name in this case, or would it take
previous groupBy's params combined: field3, and name?
I have a feeling that it takes the last one only. Correct?


Thanks again. This is great info.
-A
From: supercargo@gmail.com [mailto:supercargo@gmail.com] On Behalf Of Adam Lewis
Sent: February-07-14 12:59 PM
To: user
Subject: Re: aggregation in Trident

Hi Adrian,

Q1: Count and Sum are different just as in a relational DB.  Count will just count the number
of tuples, while Sum will sum up the values in the field you specify.  So in your example,
if you had three tuples with field "b" [[1],[2],[3]] then count would be 3 and sum would be
6.  Of course, if b is always 1, then they are the same.  Also, note, that you are asking
for the aggregate only within the partition (see Q2)

Q2: you can specify a .groupBy(new Fields("name")) to get a different aggregation for each
unique value of name.  Again, very similar to SQL group by, you will preserve any fields which
you group by and aggregate the other fields into new fields.

Take a look at the trident reach and word count tutorials to see these concepts in action
https://github.com/nathanmarz/storm/wiki/Trident-tutorial

Adam

On Fri, Feb 7, 2014 at 12:36 PM, Adrian Mocanu <amocanu@verticalscope.com<mailto:amocanu@verticalscope.com>>
wrote:
Hi group

Q1: What is the difference between Sum() and Count() as aggregators? I thought they meant
the same thing ie: you count to get the sum.
https://github.com/nathanmarz/storm/wiki/Trident-API-Overview#partitionaggregate gives this
example where both are emitted:
mystream.chainedAgg()
        .partitionAggregate(new Count(), new Fields("count"))
        .partitionAggregate(new Fields("b"), new Sum(), new Fields("sum"))
        .chainEnd()

Q2:
If  you have a tuple with 3 fields like ("name","value","field3") and want to count how many
tuples with the same name you get I can easily use a Count() or Sum() (are they interchangeable?-
see Q1). Problem is after aggregation I get only the sum and not the other fields like "name"
and "field3"
Maybe Trident API wiki page can be updated with such an example

Thanks
-A



Mime
View raw message