Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 732D810271 for ; Wed, 5 Feb 2014 19:38:23 +0000 (UTC) Received: (qmail 90521 invoked by uid 500); 5 Feb 2014 19:38:22 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 90488 invoked by uid 500); 5 Feb 2014 19:38:22 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 90480 invoked by uid 99); 5 Feb 2014 19:38:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Feb 2014 19:38:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of supercargo@gmail.com designates 209.85.213.49 as permitted sender) Received: from [209.85.213.49] (HELO mail-yh0-f49.google.com) (209.85.213.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Feb 2014 19:38:16 +0000 Received: by mail-yh0-f49.google.com with SMTP id t59so955221yho.36 for ; Wed, 05 Feb 2014 11:37:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=sBTlEcZys0uURCOoVgDADAlxGmO1N+0GgIg6edyzX84=; b=bqqUIcPtlo0x2yyBfH+Eghaf3zTseFs7+1G7iv+Aw7t6ZrZGR9ek7MYJVrcsIlgtSC /Wqoa9S/jGPi4VnVlzRs70Uv7K1hwWWa5lkgBRnTcSd+pZ2NF7ONQn5gXJMwtOWRHJbX tpkCBdM5eCKth3KZm7EqYbUFFifzJk1tdcyLnH4z1FMRLuwJk0RDalH1OaHVpDtGIy9K OcQDzjj9MrKyqZVQhGDvmvfzgJfiv2YkLGaVishTuQ9AHfnrHiB+w8AMLZMdsRUsKcgu W+7dsknCgVkEED54W5LekBNJGookzHD5ceyGQ0aWUthf3P73t9oMSr9akTTmgmbsouY+ e1FA== X-Received: by 10.236.28.162 with SMTP id g22mr3070713yha.52.1391629074997; Wed, 05 Feb 2014 11:37:54 -0800 (PST) MIME-Version: 1.0 Sender: supercargo@gmail.com Received: by 10.170.119.143 with HTTP; Wed, 5 Feb 2014 11:37:34 -0800 (PST) In-Reply-To: <6ed04dea8c474e47b0fcb9d6648ffefc@CO2PR07MB522.namprd07.prod.outlook.com> References: <9dedec1d73274f4b8ea99febad0ceb16@CO2PR07MB522.namprd07.prod.outlook.com> <677278F5-9A87-45AF-BDEC-0EE720D8AD11@gmail.com> <6ed04dea8c474e47b0fcb9d6648ffefc@CO2PR07MB522.namprd07.prod.outlook.com> From: Adam Lewis Date: Wed, 5 Feb 2014 14:37:34 -0500 X-Google-Sender-Auth: zr7SvWMz-86waK9YuLPptk_0_uw Message-ID: Subject: Re: Svend's blog - several questions To: user Content-Type: multipart/alternative; boundary=001a11c1f782aca85d04f1ade4ef X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1f782aca85d04f1ade4ef Content-Type: text/plain; charset=ISO-8859-1 To your first two questions: Q1: Do the db keys come only from groupBy? Yes, that is how MapStates get their keys Q2: Can you do groupBy multiple keys:like .groupBy("name").groupBy("id") ? Yes, you can specify several fields in a single groupBy, e.g. myStream.groupBy(new Fields("name","id")) On Wed, Feb 5, 2014 at 1:13 PM, Adrian Mocanu wrote: > Thanks > > Looking forward to a reply! > > > > *From:* P. Taylor Goetz [mailto:ptgoetz@gmail.com] > *Sent:* February-05-14 12:39 PM > *To:* user@storm.incubator.apache.org > *Subject:* Re: Svend's blog - several questions > > > > Hi Adrian, > > > > I'll apologize up-front for not answering your questions now, but I'll try > to follow up later when I have a little more bandwidth. > > > > In the meantime, check out the storm documentation on the new Storm > website: http://storm.incubator.apache.org, which includes the latest > javadoc for the 0.9.x development line. > > > > Specifically, look for the documentation for trident, which should answer > Q7/Q8. > > > > Again, I'll try to address your other questions when I have more time, if > someone else doesn't address them first. > > > > - Taylor > > > > On Feb 5, 2014, at 12:22 PM, Adrian Mocanu > wrote: > > > > I've read Svend's blog [ > http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with-storm/] > multiple times and I have a few questions. > > > > > > "Because we did a groupBy on one tuple field, each List contains here one > single > > String: the correlationId. Note that the list we return must have exactly > the same > > size as the list of keys, so that Storm knows what period corresponds to > what key. > > So for any key that does not exist in DB, we simply put a null in the > resulting list." > > > > Q1: Do the db keys come only from groupBy? > > Q2: Can you do groupBy multiple keys:like .groupBy("name").groupBy("id") ? > > Q3: When we add null we keep the size of the results list the same as they > keys list but I don't understand how we make sure that key(3) points to > correct result(3). > > After all we're adding nulls at the end of result list not intermitently. > ie: if > > key(1) does not have an entry in db, and key size is 5, we add null to > last position > > in results not to results(1). This doesn't preserve consistency/order so > key(1) now > > gives result(1) which is not null as it should be. Is the code incorrect > ... or the > > explanation on Svend's blog is incorrect? > > > > > > Moving on, > > "Once this is loaded Storm will present the tuples having the same > correlation ID > > one by one to our reducer, the PeriodBuilder" > > > > Q4: Does Trident/Storm call the reducer after calling multiGet and before > calling multiPut? > > Q5: What params (and their types) are passed to the reducer and what > parameters should it emit so they can go into multiGet? > > > > Q6: The first time the program is run the database is empty and multiGet > will return nothing. > > Does the reducer need to take care and make sure to insert for the first > time as opposed to update value? I do see that reducer (TimelineUpdater) > checks for nulls and I'm guessing this is the reason why it does so. > > > > > > Q7: > > Can someone explain what these mean: > > .each (I've seen this used even consecutively: .each(..).each(..) ) > > .newStream > > .newValuesStream > > .persistAggregate > > > > I am unable to find javadocs with documentation for the method signatures. > > These java docs don't help much: > http://nathanmarz.github.io/storm/doc/storm/trident/Stream.html > > > > > > Q8: > > Storm has ack/fail; does Trident handle that automatically? > > > > > > Q9: Has anyone tried Spark? http://spark.incubator.apache.org/streaming/ > > I'm wondering if anyone has tried it because I'm thinking of ditching > storm and moving to that. > > It seems much much much better documented. > > > > > > Lots of questions I know. Thanks for reading! > > > > -Adrian > > > --001a11c1f782aca85d04f1ade4ef Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

To your first two questions= :


Q1: Do the db keys come only from groupBy?

Yes, tha= t is how MapStates get their keys


Q2: Can you do groupBy multiple keys:like .groupBy("name").groupB= y("id") ?

Yes, you can specify several fields in a single groupBy, 

e.g. myStream.groupBy(new Fields("name","= ;id"))


=

On Wed, Feb 5, 2014 at 1:13 PM, Adrian M= ocanu <amocanu@verticalscope.com> wrote:

Thanks

Looking forward to a repl= y!

 

From: P. Taylor Goetz [mailto:ptgoetz@gmail.com]
Sent: February-05-14 12:39 PM
To: user@storm.incubator.apache.org
Subject: Re: Svend's blog - several questions

 

Hi Adrian,

 

I’ll apologize up-front for not answering your= questions now, but I’ll try to follow up later when I have a little = more bandwidth.

 

In the meantime, check out the storm documentation o= n the new Storm website: http://stor= m.incubator.apache.org, which includes the latest javadoc for the 0.9.x= development line.

 

Specifically, look for the documentation for trident= , which should answer Q7/Q8.

 

Again, I’ll try to address your other question= s when I have more time, if someone else doesn’t address them first.<= u>

 

- Taylor

 

On Feb 5, 2014, at 12:22 PM, Adrian Mocanu <amocanu@verticals= cope.com> wrote:



I've read Svend's blog [http= ://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-updat= e-with-storm/] multiple times and I have a few questions.

 

 

"Because we did a groupBy on one t= uple field, each List contains here one single

String: the correlationId. Note that th= e list we return must have exactly the same

size as the list of keys, so that Storm= knows what period corresponds to what key.

So for any key that does not exist in D= B, we simply put a null in the resulting list."

 

Q1: Do the db keys come only from group= By?

Q2: Can you do groupBy multiple keys:li= ke .groupBy("name").groupBy("id") ?

Q3: When we add null we keep the size o= f the results list the same as they keys list but I don't understand ho= w we make sure that key(3) points to correct result(3).

After all we're adding nulls at the= end of result list not intermitently. ie: if

key(1) does not have an entry in db, an= d key size is 5, we add null to last position

in results not to results(1). This does= n't preserve consistency/order so key(1) now

gives result(1) which is not null as it= should be. Is the code incorrect ... or the

explanation on Svend's blog is inco= rrect?

 

 

Moving on,

"Once this is loaded Storm will pr= esent the tuples having the same correlation ID

one by one to our reducer, the PeriodBu= ilder"

 

Q4: Does Trident/Storm call the reducer= after calling multiGet and before calling multiPut?

Q5: What params (and their types) are p= assed to the reducer and what parameters should it emit so they can go into= multiGet?

 

Q6: The first time the program is run t= he database is empty and multiGet will return nothing.=

Does the reducer need to take care and = make sure to insert for the first time as opposed to update value? I do see= that reducer (TimelineUpdater) checks for nulls and I'm guessing this is the reason why it does so.

 

 

Q7:

Can someone explain what these mean:=

.each  (I've seen this used ev= en consecutively: .each(..).each(..) )

.newStream

.newValuesStream

.persistAggregate<= /p>

 

I am unable to find javadocs with docum= entation for the method signatures.

 

 

Q8:

Storm has ack/fail; does Trident handle= that automatically?

 

 

Q9: Has anyone tried Spark? = http://spark.incubator.apache.org/stre= aming/

I'm wondering if anyone has tried i= t because I'm thinking of ditching storm and moving to that.<= /u>

It seems much much much better document= ed.

 

 

Lots of questions I know. Thanks for re= ading!

 

-Adrian

 


--001a11c1f782aca85d04f1ade4ef--