Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6261810AA9 for ; Wed, 5 Feb 2014 17:22:33 +0000 (UTC) Received: (qmail 33389 invoked by uid 500); 5 Feb 2014 17:22:32 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 33352 invoked by uid 500); 5 Feb 2014 17:22:32 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 33343 invoked by uid 99); 5 Feb 2014 17:22:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Feb 2014 17:22:32 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of amocanu@verticalscope.com designates 207.46.163.241 as permitted sender) Received: from [207.46.163.241] (HELO na01-by2-obe.outbound.protection.outlook.com) (207.46.163.241) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Feb 2014 17:22:25 +0000 Received: from CO2PR07MB522.namprd07.prod.outlook.com (10.141.198.144) by CO2PR07MB523.namprd07.prod.outlook.com (10.141.198.153) with Microsoft SMTP Server (TLS) id 15.0.868.8; Wed, 5 Feb 2014 17:22:03 +0000 Received: from CO2PR07MB522.namprd07.prod.outlook.com ([10.141.198.144]) by CO2PR07MB522.namprd07.prod.outlook.com ([10.141.198.144]) with mapi id 15.00.0868.013; Wed, 5 Feb 2014 17:22:03 +0000 From: Adrian Mocanu To: "user@storm.incubator.apache.org" Subject: Svend's blog - several questions Thread-Topic: Svend's blog - several questions Thread-Index: Ac8iln3Y3daFaLJUTcOunvrdOKxGqw== Date: Wed, 5 Feb 2014 17:22:02 +0000 Message-ID: <9dedec1d73274f4b8ea99febad0ceb16@CO2PR07MB522.namprd07.prod.outlook.com> Accept-Language: en-CA, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [74.213.184.33] x-forefront-prvs: 01136D2D90 x-forefront-antispam-report: SFV:NSPM;SFS:(10009001)(52604005)(189002)(199002)(43544003)(86362001)(77982001)(15202345003)(47446002)(74502001)(59766001)(94316002)(74662001)(80022001)(87266001)(49866001)(76482001)(47736001)(74876001)(74366001)(66066001)(65816001)(90146001)(93516002)(47976001)(74706001)(93136001)(79102001)(33646001)(63696002)(31966008)(81542001)(69226001)(74316001)(94946001)(81342001)(85306002)(92566001)(76576001)(50986001)(15975445006)(56776001)(83072002)(81816001)(54316002)(16236675002)(46102001)(87936001)(81686001)(2656002)(19300405004)(80976001)(56816005)(76786001)(19580395003)(76796001)(53806001)(4396001)(76176001)(54356001)(83322001)(85852003)(51856001)(24736002);DIR:OUT;SFP:1101;SCL:1;SRVR:CO2PR07MB523;H:CO2PR07MB522.namprd07.prod.outlook.com;CLIP:74.213.184.33;FPR:8E01F314.AC16D3D3.91D19C49.46DEF93B.203C1;InfoNoRecordsMX:1;A:1;LANG:en; Content-Type: multipart/alternative; boundary="_000_9dedec1d73274f4b8ea99febad0ceb16CO2PR07MB522namprd07pro_" MIME-Version: 1.0 X-OriginatorOrg: verticalscope.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_9dedec1d73274f4b8ea99febad0ceb16CO2PR07MB522namprd07pro_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I've read Svend's blog [http://svendvanderveken.wordpress.com/2013/07/30/sc= alable-real-time-state-update-with-storm/] multiple times and I have a few = questions. "Because we did a groupBy on one tuple field, each List contains here one s= ingle String: the correlationId. Note that the list we return must have exactly t= he same size as the list of keys, so that Storm knows what period corresponds to wh= at key. So for any key that does not exist in DB, we simply put a null in the resul= ting list." Q1: Do the db keys come only from groupBy? Q2: Can you do groupBy multiple keys:like .groupBy("name").groupBy("id") ? Q3: When we add null we keep the size of the results list the same as they = keys list but I don't understand how we make sure that key(3) points to cor= rect result(3). After all we're adding nulls at the end of result list not intermitently. i= e: if key(1) does not have an entry in db, and key size is 5, we add null to last= position in results not to results(1). This doesn't preserve consistency/order so ke= y(1) now gives result(1) which is not null as it should be. Is the code incorrect ..= . or the explanation on Svend's blog is incorrect? Moving on, "Once this is loaded Storm will present the tuples having the same correlat= ion ID one by one to our reducer, the PeriodBuilder" Q4: Does Trident/Storm call the reducer after calling multiGet and before c= alling multiPut? Q5: What params (and their types) are passed to the reducer and what parame= ters should it emit so they can go into multiGet? Q6: The first time the program is run the database is empty and multiGet wi= ll return nothing. Does the reducer need to take care and make sure to insert for the first ti= me as opposed to update value? I do see that reducer (TimelineUpdater) chec= ks for nulls and I'm guessing this is the reason why it does so. Q7: Can someone explain what these mean: .each (I've seen this used even consecutively: .each(..).each(..) ) .newStream .newValuesStream .persistAggregate I am unable to find javadocs with documentation for the method signatures. These java docs don't help much: http://nathanmarz.github.io/storm/doc/stor= m/trident/Stream.html Q8: Storm has ack/fail; does Trident handle that automatically? Q9: Has anyone tried Spark? http://spark.incubator.apache.org/streaming/ I'm wondering if anyone has tried it because I'm thinking of ditching storm= and moving to that. It seems much much much better documented. Lots of questions I know. Thanks for reading! -Adrian --_000_9dedec1d73274f4b8ea99febad0ceb16CO2PR07MB522namprd07pro_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I've read Svend's blog [http://svendvanderveken.word= press.com/2013/07/30/scalable-real-time-state-update-with-storm/] multiple = times and I have a few questions.

 

 

"Because we did a groupBy on one tuple field, e= ach List contains here one single

String: the correlationId. Note that the list we ret= urn must have exactly the same

size as the list of keys, so that Storm knows what p= eriod corresponds to what key.

So for any key that does not exist in DB, we simply = put a null in the resulting list."

 

Q1: Do the db keys come only from groupBy?

Q2: Can you do groupBy multiple keys:like .groupBy(&= quot;name").groupBy("id") ?

Q3: When we add null we keep the size of the results= list the same as they keys list but I don't understand how we make sure th= at key(3) points to correct result(3).

After all we're adding nulls at the end of result li= st not intermitently. ie: if

key(1) does not have an entry in db, and key size is= 5, we add null to last position

in results not to results(1). This doesn't preserve = consistency/order so key(1) now

gives result(1) which is not null as it should be. I= s the code incorrect ... or the

explanation on Svend's blog is incorrect?=

 

 

Moving on,

"Once this is loaded Storm will present the tup= les having the same correlation ID

one by one to our reducer, the PeriodBuilder"

 

Q4: Does Trident/Storm call the reducer after callin= g multiGet and before calling multiPut?

Q5: What params (and their types) are passed to the = reducer and what parameters should it emit so they can go into multiGet?

 

Q6: The first time the program is run the database i= s empty and multiGet will return nothing.

Does the reducer need to take care and make sure to = insert for the first time as opposed to update value? I do see that reducer= (TimelineUpdater) checks for nulls and I'm guessing this is the reason why= it does so.

 

 

Q7:

Can someone explain what these mean:

.each  (I've seen this used even consecutively:= .each(..).each(..) )

.newStream

.newValuesStream

.persistAggregate

 

I am unable to find javadocs with documentation for = the method signatures.

These java docs don't help much: http://nathanmarz.g= ithub.io/storm/doc/storm/trident/Stream.html

 

 

Q8:

Storm has ack/fail; does Trident handle that automat= ically?

 

 

Q9: Has anyone tried Spark? http://spark.incubator.apache.org/streaming/

I'm wondering if anyone has tried it because I'm thi= nking of ditching storm and moving to that.

It seems much much much better documented.

 

 

Lots of questions I know. Thanks for reading!

 

-Adrian

 

--_000_9dedec1d73274f4b8ea99febad0ceb16CO2PR07MB522namprd07pro_--