storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Lewis <m...@adamlewis.com>
Subject Re: aggregation in Trident
Date Sun, 09 Feb 2014 13:25:04 GMT
Yes on both counts, although for Q2.2, there isn't much value in only doing
an "each" on the grouped stream before regrouping it (groupBy is
repartitioning, so you will have data transfer).  Unless you mean that you
want to have two different streams each grouped by a different field.

At this point, I would suggest you build these out as little test
topologies and run them locally to see how they behave.  One thing I have
found useful in learning the API is to print the output of getOutputFields
at different points...having gone so far as to write a helper method that
would do this and immediately call System.exit so that the output would be
last thing in my console...that and creating debug nodes which take the
entire streams field list like:

Stream myStream = ... ;

myStream.each(myStream.getOutputFields(), new Debug());

This will help you understand how the different operators transform the
tuples structurally.


On Friday, February 7, 2014, Adrian Mocanu <amocanu@verticalscope.com>
wrote:

>  Hi Adam,
>
>
>
> Thanks for your reply. Very helpful!
>
>
>
> Follow up on Q2:
>
> Q2.1
>
> So if I do a .groupBy(new Fields("name")) then I use a count aggregator
> and I have 3 tuples with the same name:
>
> ("name"," value1","field3")
>
> ("name"," value2","field3")
>
> ("name"," value3","field3")
>
> the output result tuple of the aggregation would be ("name","count").
> Correct?
>
>
>
> Q2.2
>
> In my stream, before I do  this counting, I do a groupBy(new
> Fields("field3")).each( .. ) then can I do a groupBy again .groupBy(new
> Fields("name")) ?
>
> If so, would Count() take the last groupBy's parameter, name in this case,
> or would it take previous groupBy's params combined: field3, and name?
>
> I have a feeling that it takes the last one only. Correct?
>
>
>
>
>
> Thanks again. This is great info.
>
> -A
>
> *From:* supercargo@gmail.com<javascript:_e(%7B%7D,'cvml','supercargo@gmail.com');>[mailto:
> supercargo@gmail.com<javascript:_e(%7B%7D,'cvml','supercargo@gmail.com');>]
> *On Behalf Of *Adam Lewis
> *Sent:* February-07-14 12:59 PM
> *To:* user
> *Subject:* Re: aggregation in Trident
>
>
>
> Hi Adrian,
>
>
>
> Q1: Count and Sum are different just as in a relational DB.  Count will
> just count the number of tuples, while Sum will sum up the values in the
> field you specify.  So in your example, if you had three tuples with field
> "b" [[1],[2],[3]] then count would be 3 and sum would be 6.  Of course, if
> b is always 1, then they are the same.  Also, note, that you are asking for
> the aggregate only within the partition (see Q2)
>
>
>
> Q2: you can specify a .groupBy(new Fields("name")) to get a different
> aggregation for each unique value of name.  Again, very similar to SQL
> group by, you will preserve any fields which you group by and aggregate the
> other fields into new fields.
>
>
>
> Take a look at the trident reach and word count tutorials to see these
> concepts in action
> https://github.com/nathanmarz/storm/wiki/Trident-tutorial
>
>
>
> Adam
>
>
>
> On Fri, Feb 7, 2014 at 12:36 PM, Adrian Mocanu <amocanu@verticalscope.com<javascript:_e(%7B%7D,'cvml','amocanu@verticalscope.com');>>
> wrote:
>
>  Hi group
>
>
>
> Q1: What is the difference between Sum() and Count() as aggregators? I
> thought they meant the same thing ie: you count to get the sum.
>
>
> https://github.com/nathanmarz/storm/wiki/Trident-API-Overview#partitionaggregategives
this example where both are emitted:
>
> mystream.chainedAgg()
>
>         .partitionAggregate(new Count(), new Fields("count"))
>
>         .partitionAggregate(new Fields("b"), new Sum(), new Fields("sum"))
>
>         .chainEnd()
>
>
>
> Q2:
>
> If  you have a tuple with 3 fields like ("name","value","field3") and want
> to count how many tuples with the same name you get I can easily use a
> Count() or Sum() (are they interchangeable?- see Q1). Problem is after
> aggregation I get only the sum and not the other fields like "name" and
> "field3"
>
> Maybe Trident API wiki page can be updated with such an example
>
>
>
> Thanks
>
> -A
>
>
>
>
>

Mime
View raw message