Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@storm.incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of boneill42@gmail.com
 designates 209.85.128.46 as permitted sender)
Sender: "Brian O'Neill" <boneill42@gmail.com>
User-Agent: Microsoft-MacOutlook/14.3.9.131030
Date: Wed, 15 Jan 2014 13:07:33 -0500
Subject: Re: Strom research suggestions
From: Brian O'Neill <bone@alumni.brown.edu>
To: <user@storm.incubator.apache.org>
Message-ID: <CEFC380E.8EEE2%bone@alumni.brown.edu>
Thread-Topic: Strom research suggestions
References: 
 <CANW+zHrL3hspzNux=L-AUSUq++nMLJ99tjtGTYNCAnQJiT+9Ww@mail.gmail.com>
 <CANW+zHpUDmR9p6RshRfbg0k3coiozPLJtRsbNJ21yzGTANR2Vg@mail.gmail.com>
 <CAAYLz+rz3rNp+VObu7KzvW97uR840asX-cczfQZZJS0KfPxp7w@mail.gmail.com>
 <CEF42FA5.8C865%bone@alumni.brown.edu>
 <CAD9ohx_vKikD4cPB_hqF8MvRoagR+orX_3zsA4tv0gkWQvZ_RA@mail.gmail.com>
 <edf92796bbb549b0b31469c4a26d2a42@BN1PR06MB165.namprd06.prod.outlook.com>
 <CANW+zHpSKqEMGekDzyhuM33Hsk2+=qKKMfJx30AOo=kxxanxEA@mail.gmail.com>
 <CAAYLz+rkmxaSTGBT7SUVEk3GOLeeBwMn9DOHyzoa87HXE1Ot8w@mail.gmail.com>
In-Reply-To: 
 <CAAYLz+rkmxaSTGBT7SUVEk3GOLeeBwMn9DOHyzoa87HXE1Ot8w@mail.gmail.com>
Mime-version: 1.0
Content-type: multipart/alternative;
	boundary="B_3472636057_2920739"

> This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

--B_3472636057_2920739
Content-type: text/plain;
	charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable


Agree w/ Svend.  That use case is a good one where Cassandra is used for
output.

I=B9d also suggest that you tackle a use case that uses a ColumnFamily as
input. =20
(perhaps a Kafka queue of row/partition keys)

Then, use Svend=B9s suggestion to route the keys to the machines that host th=
e
data.

-brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive =80 King of Prussia, PA =80 19406
M: 215.588.6024 =80 @boneill42 <http://www.twitter.com/boneill42>   =80
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
=20


From:  Svend Vanderveken <svend.vanderveken@gmail.com>
Reply-To:  <user@storm.incubator.apache.org>
Date:  Friday, January 10, 2014 at 6:39 AM
To:  <user@storm.incubator.apache.org>
Subject:  Re: Strom research suggestions

Hi,=20

Cool, I hope this thing can get started then. Based the comments from Brian=
,
Adam, Klausen and Michael 's that I was happy to read, I feel I would not b=
e
the only one willing to share ideas and/or code about that :D

I guess a starting point would be to dig into the details of the available
strategies for partitioning data in Cassandra:

http://www.datastax.com/docs/1.0/cluster_architecture/partitioning


Then imagine you have a bunch of Storm tuples coming in in real time,
including, say, a geo-localization, and you want to regroup all the events
happening in the same "locationId" (e.g postal code, or rounded
latitude/longitude, whatever...) in order to have some counters for each
such group. Storm is going to partition the processing of all those tuples
across its cluster, so the idea is to tell Storm to do so in the same
fashion as Cassandra is partitioning the storage (the counters for each
locationId). Hmmm, maybe it's as simple as adding a tuple field that
contains the result of the Cassandra partitioner (by querying it or
including the logic as a functionality of the topology) and do a
partitionBy() on that. Actually, as I understand it
groupBy()+PersistentAggregate is built based on a simple
partitionBy+partitionAggregate, so code-wise this whole thing might not be
huge. Go ahead, that sounds cool!

Cheers,=20

Svend


On Thu, Jan 9, 2014 at 9:53 PM, Tobias Pazer <tobiaspazer@gmail.com> wrote:
> This is exactly what I was looking for, as I am reading a lot about Hadoo=
p at
> the same time. Haven't got any experience with partitioning alignment so =
far,
> so I would appreciate any suggestions on how to approach this topic
> efficiently. But this shouldn't be a problem as I still have until Octobe=
r...
>=20
> Now I just have to convince my academic advisor.
>=20
> Thanks so far I think this topic is definitly worth to look into.
>=20
>=20
>=20
>=20
> 2014/1/9 Michael Oczkowski <Michael.Oczkowski@seeq.com>
>> +1 for this idea.  I heard DataStax was investigating Storm integration =
(like
>> they do with Hadoop) but so far as I know this isn=B9t going to happen.  T=
he
>> need for push-down analytics is great and a very general problem and any=
 nice
>> solution would help many people!
>> =20
>> Also to Brian=B9s point it would be great to use Storm in lieu of Hadoop i=
f
>> it=B9s performant.
>> =20
>> From: supercargo@gmail.com [mailto:supercargo@gmail.com] On Behalf Of Ad=
am
>> Lewis
>> Sent: Thursday, January 9, 2014 9:11 AM
>> To: user
>>=20
>>=20
>> Subject: Re: Strom research suggestions
>> =20
>>=20
>> I love it; even if it is a premature optimization the beauty of academic=
 work
>> is that this should be measurable and is still an interesting finding ei=
ther
>> way.  I don't have the large scale production experience with storm that
>> others here have (yet), but it sounds like it would really help performa=
nce
>> since you're going after network transfer.  And as you say, Svend, all t=
he
>> ingredients are already built in to trident.
>>=20
>> =20
>>=20
>> Adam
>>=20
>> =20
>>=20
>> On Thu, Jan 9, 2014 at 10:56 AM, Brian O'Neill <bone@alumni.brown.edu> w=
rote:
>>>=20
>>> =20
>>>=20
>>> +1, love the idea.  I=B9ve wanted to play with partitioning alignment mys=
elf
>>> (with C*), but i=B9ve been too busy with the day job. =3D)
>>>=20
>>> =20
>>>=20
>>> Tobias, if you need some support =8B don=B9t hesitate to reach out.
>>>=20
>>> =20
>>>=20
>>> If you are able to align the partitioning, and we can add =B3in-place=B2
>>> computation within Storm, it would be great to see a speed comparison
>>> between Hadoop and Storm.   (If comparable, it may drive people to aban=
don
>>> their Hadoop infrastructure for batch processing, and run everything on
>>> Storm)
>>>=20
>>> =20
>>>=20
>>> -brian
>>>=20
>>> =20
>>>=20
>>> ---
>>> Brian O'Neill
>>> Chief Architect
>>> Health Market Science
>>> The Science of Better Results
>>> 2700 Horizon Drive =80 King of Prussia, PA =80 19406
>>> M: 215.588.6024 <tel:215.588.6024>  =80 @boneill42
>>> <http://www.twitter.com/boneill42>   =80
>>> healthmarketscience.com
>>> =20
>>> This information transmitted in this email message is for the intended
>>> recipient only and may contain confidential and/or privileged material.=
 If
>>> you received this email in error and are not the intended recipient, or=
 the
>>> person responsible to deliver it to the intended recipient, please cont=
act
>>> the sender at the email above and delete this email and any attachments=
 and
>>> destroy any copies thereof. Any review, retransmission, dissemination,
>>> copying or other use of, or taking any action in reliance upon, this
>>> information by persons or entities other than the intended recipient is
>>> strictly prohibited.
>>> =20
>>>=20
>>> =20
>>>=20
>>> From: Svend Vanderveken <svend.vanderveken@gmail.com
>>> <mailto:svend.vanderveken@gmail.com> >
>>> Reply-To: <user@storm.incubator.apache.org
>>> <mailto:user@storm.incubator.apache.org> >
>>> Date: Thursday, January 9, 2014 at 10:46 AM
>>> To: <user@storm.incubator.apache.org
>>> <mailto:user@storm.incubator.apache.org> >
>>> Subject: Re: Strom research suggestions
>>>=20
>>> =20
>>>=20
>>> Hey Tobias,=20
>>>=20
>>> =20
>>>=20
>>> =20
>>>=20
>>> Nice project, I would have loved to play with something like storm back=
 in
>>> my university days :)
>>>=20
>>> =20
>>>=20
>>> Here's a topic that's been on my mind for a while (Trident API of storm=
):
>>>=20
>>> =20
>>>=20
>>> =20
>>>=20
>>> * one core idea of distributed map reduce =E0 la hadoop was to perform as=
 much
>>> processing as possible close to the data: you execute the "map" locally=
 on
>>> each node where the data sits, you do a first reduce there, then you le=
t the
>>> result travel through the network, you do one last reduce centrally and=
 you
>>> have a result without having all your DB travel the network everytime
>>>=20
>>> =20
>>>=20
>>> * Storm groupBy + persistentAggregate + reducer/combiner let us have a
>>> similar semantic, where we map incoming tuples, reduce them with other
>>> tuples in the same group + with previously reduced value stored in DB a=
t
>>> regular interval
>>>=20
>>> =20
>>>=20
>>> * for each group, the operation above happens always on the same Storm =
Task
>>> (i.e. the same "place" in the cluster) and stores its ongoing state in =
the
>>> "same place" in DB, using the group value as primary key
>>>=20
>>> =20
>>>=20
>>> I believe it might be worth investigating if the following pattern woul=
d
>>> make sense:=20
>>>=20
>>> =20
>>>=20
>>> * install a distributed state store (e..g cassandra) on the same nodes =
as
>>> the Storm workers
>>>=20
>>> =20
>>>=20
>>> * try to align the Storm partitioning triggered by the groupby with
>>> Cassandra partitioning, so that under usual happy circumstances (no cra=
sh),
>>> the Storm reduction is happening on the node where Cassandra is storing=
 that
>>> particular primary key, avoiding the network travel for the persistence=
.
>>>=20
>>> =20
>>>=20
>>> =20
>>>=20
>>> What do you think? Premature optimization? Does not make sense? Great i=
dea?
>>> Let me know :)
>>>=20
>>> =20
>>>=20
>>> =20
>>>=20
>>> S
>>>=20
>>> =20
>>>=20
>>> =20
>>>=20
>>> =20
>>>=20
>>> On Thu, Jan 9, 2014 at 3:00 PM, Tobias Pazer <tobiaspazer@gmail.com
>>> <mailto:tobiaspazer@gmail.com> > wrote:
>>>> Hi all,
>>>>=20
>>>> I have recently started writing my master thesis with a focus on storm=
, as
>>>> we are planning to implement the lambda architecture in our university=
.
>>>>=20
>>>> As it's still not very clear for me where exactly it's worth to dive i=
nto,
>>>> I was hoping one of you might have any suggestions.
>>>>=20
>>>> I was thinking about a benchmark or something else to systematically
>>>> evaluate and improve the configuration of storm, but I'm not sure if t=
his
>>>> is even worth the time.
>>>>=20
>>>> I think the more experienced of you definitely have further ideas!
>>>>=20
>>>> Thanks and regards
>>>> Tobias
>>>=20
>>> =20
>> =20
>=20


--B_3472636057_2920739
Content-type: text/html;
	charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

<html><head></head><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: s=
pace; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size:=
 12px; font-family: Verdana, sans-serif;"><div><div><div><br></div><div>Agre=
e w/ Svend. &nbsp;That use case is a good one where Cassandra is used for ou=
tput.</div><div><br></div><div>I&#8217;d also suggest that you tackle a use =
case that uses a ColumnFamily as input. &nbsp;</div><div>(perhaps a Kafka qu=
eue of row/partition keys)</div><div><br></div><div>Then, use Svend&#8217;s =
suggestion to route the keys to the machines that host the data.</div><div><=
br></div><div>-brian</div><div><br></div><div><p class=3D"MsoNormal" style=3D"fo=
nt-family: Calibri, sans-serif; font-size: 14px; margin: 0in 0in 0.0001pt;">=
<font class=3D"Apple-style-span" face=3D"Verdana" style=3D"font-size: 12px;">---</=
font></p><p class=3D"MsoNormal" style=3D"font-family: Calibri, sans-serif; font-=
size: 14px; margin: 0in 0in 0.0001pt;"><span style=3D"color: rgb(230, 29, 53);=
 font-size: 12px;"><font class=3D"Apple-style-span" face=3D"Verdana">Brian O'Nei=
ll</font></span></p><p class=3D"MsoNormal" style=3D"font-family: Calibri, sans-s=
erif; font-size: 14px; margin: 0in 0in 0.0001pt;"><font class=3D"Apple-style-s=
pan" face=3D"Verdana" style=3D"font-size: 12px;">Chief Architect</font></p><p cl=
ass=3D"MsoNormal" style=3D"font-family: Calibri, sans-serif; font-size: 14px; ma=
rgin: 0in 0in 0.0001pt;"><b><font class=3D"Apple-style-span" face=3D"Verdana" st=
yle=3D"font-size: 12px;">Health Market&nbsp;<span style=3D"color: rgb(230, 29, 5=
3); ">Science<o:p></o:p></span></font></b></p><p class=3D"MsoNormal" style=3D"fo=
nt-family: Calibri, sans-serif; font-size: 14px; margin: 0in 0in 0.0001pt;">=
<i><span style=3D"font-size: 12px;"><font class=3D"Apple-style-span" face=3D"Verda=
na">The Science of Better Results<o:p></o:p></font></span></i></p><p class=3D"=
MsoNormal" style=3D"font-family: Calibri, sans-serif; font-size: 14px; margin:=
 0in 0in 0.0001pt;"><font class=3D"Apple-style-span" face=3D"Verdana" style=3D"fon=
t-size: 12px;">2700 Horizon Drive&nbsp;<span style=3D"color: rgb(237, 26, 52);=
 position: relative; top: -0.5pt; letter-spacing: -0.4pt; ">&#8226;</span><s=
pan style=3D"letter-spacing: -0.4pt; ">&nbsp;</span>King of Prussia, PA&nbsp;<=
span style=3D"color: rgb(237, 26, 52); position: relative; top: -0.5pt; letter=
-spacing: -0.4pt; ">&#8226;</span><span style=3D"letter-spacing: -0.4pt; ">&nb=
sp;</span>19406<o:p></o:p></font></p><p class=3D"MsoNormal" style=3D"font-family=
: Calibri, sans-serif; font-size: 14px; margin: 0in 0in 0.0001pt; "><font cl=
ass=3D"Apple-style-span" face=3D"Verdana" style=3D"font-size: 12px; ">M: 215.588.6=
024&nbsp;<span class=3D"Apple-style-span" style=3D"color: rgb(237, 26, 52);">&#8=
226; </span><span style=3D"letter-spacing: -0.4pt; "><a href=3D"http://www.twitt=
er.com/boneill42" style=3D"line-height: 20px; orphans: 2; widows: 2; backgroun=
d-color: rgb(255, 255, 255);"><font color=3D"#000000">@boneill42</font></a>&nb=
sp;&nbsp;</span><span style=3D"color: rgb(237, 26, 52); position: relative; to=
p: -0.5pt; letter-spacing: -0.4pt; ">&#8226;</span><span style=3D"letter-spaci=
ng: -0.4pt;">&nbsp;&nbsp;</span></font></p><p class=3D"MsoNormal" style=3D"font-=
family: Calibri, sans-serif; font-size: 14px; margin: 0in 0in 0.0001pt;"><fo=
nt class=3D"Apple-style-span" face=3D"Verdana" style=3D"font-size: 12px;">healthma=
rket<span style=3D"color: rgb(230, 29, 53); ">science</span>.com</font></p><p =
class=3D"MsoNormal" style=3D"margin: 0in 0in 0.0001pt; font-size: 11pt; font-fam=
ily: Calibri, sans-serif;"><br></p><p class=3D"MsoNormal" style=3D"margin: 0in 0=
in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;"></p><p clas=
s=3D"MsoNormal" style=3D"margin: 0in 0in 0.0001pt; font-size: 11pt; font-family:=
 Calibri, sans-serif;"><span style=3D"font-size: 8pt; color: rgb(31, 73, 125);=
 ">This information transmitted in this email message is for the intended re=
cipient only and may contain confidential and/or privileged material. If you=
 received this email in error and are not the intended recipient, or the per=
son responsible to deliver it to the intended recipient, please contact the =
sender at the email above and delete this email and any attachments and dest=
roy any copies thereof. Any review, retransmission, dissemination, copying o=
r other use of, or taking any action in reliance upon, this information by p=
ersons or entities other than the intended recipient is strictly prohibited.=
<o:p></o:p></span></p><p class=3D"MsoNormal" style=3D"margin: 0in 0in 0.0001pt; =
font-size: 11pt; font-family: Calibri, sans-serif;"><o:p>&nbsp;</o:p></p><p =
style=3D"font-family: Calibri, sans-serif; font-size: 14px;"></p></div></div><=
/div><div><br></div><span id=3D"OLK_SRC_BODY_SECTION"><div style=3D"font-family:=
Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium=
 none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PAD=
DING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; P=
ADDING-TOP: 3pt"><span style=3D"font-weight:bold">From: </span> Svend Vanderve=
ken &lt;<a href=3D"mailto:svend.vanderveken@gmail.com">svend.vanderveken@gmail=
.com</a>&gt;<br><span style=3D"font-weight:bold">Reply-To: </span> &lt;<a href=
=3D"mailto:user@storm.incubator.apache.org">user@storm.incubator.apache.org</a=
>&gt;<br><span style=3D"font-weight:bold">Date: </span> Friday, January 10, 20=
14 at 6:39 AM<br><span style=3D"font-weight:bold">To: </span> &lt;<a href=3D"mai=
lto:user@storm.incubator.apache.org">user@storm.incubator.apache.org</a>&gt;=
<br><span style=3D"font-weight:bold">Subject: </span> Re: Strom research sugge=
stions<br></div><div><br></div><div dir=3D"ltr">Hi,&nbsp;<div><br></div><div>C=
ool, I hope this thing can get started then. Based the comments from Brian, =
Adam, Klausen and Michael 's that I was happy to read, I feel I would not be=
 the only one willing to share ideas and/or code about that :D</div><div><br=
></div><div>I guess a starting point would be to dig into the details of the=
 available strategies for partitioning data in Cassandra:&nbsp;</div><div><b=
r></div><div><a href=3D"http://www.datastax.com/docs/1.0/cluster_architecture/=
partitioning">http://www.datastax.com/docs/1.0/cluster_architecture/partitio=
ning</a><br></div><div><br></div><div><br></div><div>Then imagine you have a=
 bunch of Storm tuples coming in in real time, including, say, a geo-localiz=
ation, and you want to regroup all the events happening in the same "locatio=
nId" (e.g postal code, or rounded latitude/longitude, whatever...) in order =
to have some counters for each such group. Storm is going to partition the p=
rocessing of all those tuples across its cluster, so the idea is to tell Sto=
rm to do so in the same fashion as Cassandra is partitioning the storage (th=
e counters for each locationId). Hmmm, maybe it's as simple as adding a tupl=
e field that contains the result of the Cassandra partitioner (by querying i=
t or including the logic as a functionality of the topology) and do a partit=
ionBy() on that. Actually, as I understand it groupBy()+PersistentAggregate =
is built based on a simple partitionBy+partitionAggregate, so code-wise this=
 whole thing might not be huge. Go ahead, that sounds cool!</div><div><br></=
div><div>Cheers,&nbsp;</div><div><br></div><div>Svend</div><div><br></div><d=
iv><br></div><div><br></div><div><br></div></div><div class=3D"gmail_extra"><b=
r><br><div class=3D"gmail_quote">On Thu, Jan 9, 2014 at 9:53 PM, Tobias Pazer =
<span dir=3D"ltr">&lt;<a href=3D"mailto:tobiaspazer@gmail.com" target=3D"_blank">t=
obiaspazer@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote=
" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div=
 dir=3D"ltr">This is exactly what I was looking for, as I am reading a lot abo=
ut Hadoop at the same time. Haven't got any experience with partitioning ali=
gnment so far, so I would appreciate any suggestions on how to approach this=
 topic efficiently. But this shouldn't be a problem as I still have until Oc=
tober...<div><br></div><div>Now I just have to convince my academic advisor.=
</div><div><br></div><div>Thanks so far I think this topic is definitly wort=
h to look into.<br><div><br></div><div><br></div></div></div><div class=3D"HOE=
nZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quot=
e">2014/1/9 Michael Oczkowski <span dir=3D"ltr">&lt;<a href=3D"mailto:Michael.Oc=
zkowski@seeq.com" target=3D"_blank">Michael.Oczkowski@seeq.com</a>&gt;</span><=
br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex"><div lang=3D"EN-US" link=3D"blue" vlink=3D"purple"><d=
iv><p class=3D"MsoNormal"><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; color: rgb(31, 73, 125);">+1 for this idea.&nbsp; I heard DataSt=
ax was investigating Storm integration (like they do with Hadoop) but so far=
 as I know this isn&#8217;t going to happen.&nbsp; The need
 for push-down analytics is great and a very general problem and any nice s=
olution would help many people!<u></u><u></u></span></p><p class=3D"MsoNormal"=
><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(=
31, 73, 125);"><u></u>&nbsp;<u></u></span></p><p class=3D"MsoNormal"><span sty=
le=3D"font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 12=
5);">Also to Brian&#8217;s point it would be great to use Storm in lieu of H=
adoop if it&#8217;s performant.<u></u><u></u></span></p><p class=3D"MsoNormal"=
><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(=
31, 73, 125);"><u></u>&nbsp;<u></u></span></p><p class=3D"MsoNormal"><b><span =
style=3D"font-size: 11pt; font-family: Calibri, sans-serif;">From:</span></b><=
span style=3D"font-size: 11pt; font-family: Calibri, sans-serif;"> <a href=3D"ma=
ilto:supercargo@gmail.com" target=3D"_blank">supercargo@gmail.com</a> [mailto:=
<a href=3D"mailto:supercargo@gmail.com" target=3D"_blank">supercargo@gmail.com</=
a>]
<b>On Behalf Of </b>Adam Lewis<br><b>Sent:</b> Thursday, January 9, 2014 9:=
11 AM<br><b>To:</b> user</span></p><div><div><br><b>Subject:</b> Re: Strom r=
esearch suggestions<u></u><u></u></div></div><p></p><div><div><p class=3D"MsoN=
ormal"><u></u>&nbsp;<u></u></p><div><div><p class=3D"MsoNormal"><span style=3D"f=
ont-family: Arial, sans-serif;">I love it; even if it is a premature optimiz=
ation the beauty of academic work is that this should be measurable and is s=
till an interesting finding either way. &nbsp;I don't have the large scale
 production experience with storm that others here have (yet), but it sound=
s like it would really help performance since you're going after network tra=
nsfer. &nbsp;And as you say, Svend, all the ingredients are already built in=
 to trident.<u></u><u></u></span></p></div><div><p class=3D"MsoNormal"><span s=
tyle=3D"font-family: Arial, sans-serif;"><u></u>&nbsp;<u></u></span></p></div>=
<div><p class=3D"MsoNormal"><span style=3D"font-family: Arial, sans-serif;">Adam=
<u></u><u></u></span></p></div></div><div><p class=3D"MsoNormal" style=3D"margin=
-bottom:12.0pt"><u></u>&nbsp;<u></u></p><div><p class=3D"MsoNormal">On Thu, Ja=
n 9, 2014 at 10:56 AM, Brian O'Neill &lt;<a href=3D"mailto:bone@alumni.brown.e=
du" target=3D"_blank">bone@alumni.brown.edu</a>&gt; wrote:<u></u><u></u></p><b=
lockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in=
 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom=
:5.0pt"><div><div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; fon=
t-family: Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></p></div><div><p=
 class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verdana, sans-s=
erif;">+1, love the idea. &nbsp;I&#8217;ve wanted to play with partitioning =
alignment myself (with C*), but i&#8217;ve been too busy with the day job. =3D=
)<u></u><u></u></span></p></div><div><p class=3D"MsoNormal"><span style=3D"font-=
size: 9pt; font-family: Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></p=
></div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: V=
erdana, sans-serif;">Tobias, if you need some support &#8212; don&#8217;t he=
sitate to reach out.<u></u><u></u></span></p></div><div><p class=3D"MsoNormal"=
><span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;"><u></u>&nbs=
p;<u></u></span></p></div><div><p class=3D"MsoNormal"><span style=3D"font-size: =
9pt; font-family: Verdana, sans-serif;">If you are able to align the partiti=
oning, and we can add &#8220;in-place&#8221; computation within Storm, it wo=
uld be great to see a speed comparison between Hadoop and Storm. &nbsp; (If
 comparable, it may drive people to abandon their Hadoop infrastructure for=
 batch processing, and run everything on Storm)<u></u><u></u></span></p></di=
v><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verdan=
a, sans-serif;"><u></u>&nbsp;<u></u></span></p></div><div><p class=3D"MsoNorma=
l"><span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;">-brian<u>=
</u><u></u></span></p></div><div><p class=3D"MsoNormal"><span style=3D"font-size=
: 9pt; font-family: Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></p></d=
iv><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verda=
na, sans-serif;">---</span><span style=3D"font-size: 10.5pt; font-family: Cali=
bri, sans-serif;"><u></u><u></u></span></p><p class=3D"MsoNormal"><span style=3D=
"font-size: 9pt; font-family: Verdana, sans-serif; color: rgb(230, 29, 53);"=
>Brian O'Neill</span><span style=3D"font-size: 10.5pt; font-family: Calibri, s=
ans-serif;"><u></u><u></u></span></p><p class=3D"MsoNormal"><span style=3D"font-=
size: 9pt; font-family: Verdana, sans-serif;">Chief Architect</span><span st=
yle=3D"font-size: 10.5pt; font-family: Calibri, sans-serif;"><u></u><u></u></s=
pan></p><p class=3D"MsoNormal"><b><span style=3D"font-size: 9pt; font-family: Ve=
rdana, sans-serif;">Health Market&nbsp;<span style=3D"color:#e61d35">Science</=
span></span></b><span style=3D"font-size: 10.5pt; font-family: Calibri, sans-s=
erif;"><u></u><u></u></span></p><p class=3D"MsoNormal"><i><span style=3D"font-si=
ze: 9pt; font-family: Verdana, sans-serif;">The Science of Better Results</s=
pan></i><span style=3D"font-size: 10.5pt; font-family: Calibri, sans-serif;"><=
u></u><u></u></span></p><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; fo=
nt-family: Verdana, sans-serif;">2700 Horizon Drive&nbsp;<span style=3D"color:=
#ed1a34;letter-spacing:-.4pt">&#8226;</span><span style=3D"letter-spacing:-.4p=
t">&nbsp;</span>King of Prussia, PA&nbsp;<span style=3D"color:#ed1a34;letter-s=
pacing:-.4pt">&#8226;</span><span style=3D"letter-spacing:-.4pt">&nbsp;</span>=
19406</span><span style=3D"font-size: 10.5pt; font-family: Calibri, sans-serif=
;"><u></u><u></u></span></p><p class=3D"MsoNormal"><span style=3D"font-size: 9pt=
; font-family: Verdana, sans-serif;">M:
</span><a href=3D"tel:215.588.6024" target=3D"_blank"><span style=3D"font-size: 9=
pt; font-family: Verdana, sans-serif;">215.588.6024</span></a><span style=3D"f=
ont-size: 9pt; font-family: Verdana, sans-serif;">&nbsp;<span style=3D"color:#=
ed1a34">&#8226;
</span></span><a href=3D"http://www.twitter.com/boneill42" target=3D"_blank"><s=
pan style=3D"letter-spacing: -0.4pt; font-size: 9pt; font-family: Verdana, san=
s-serif;">@boneill42</span></a><span style=3D"font-size: 9pt; font-family: Ver=
dana, sans-serif; letter-spacing: -0.4pt;">&nbsp;&nbsp;<span style=3D"color:#e=
d1a34">&#8226;</span>&nbsp;&nbsp;</span><span style=3D"font-size: 10.5pt; font=
-family: Calibri, sans-serif;"><u></u><u></u></span></p><p class=3D"MsoNormal"=
><span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;">healthmarke=
t<span style=3D"color:#e61d35">science</span>.com</span><span style=3D"font-size=
: 10.5pt; font-family: Calibri, sans-serif;"><u></u><u></u></span></p><p cla=
ss=3D"MsoNormal"><span style=3D"font-size: 11pt; font-family: Calibri, sans-seri=
f;"><u></u>&nbsp;<u></u></span></p><p class=3D"MsoNormal"><span style=3D"font-si=
ze: 8pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);">This in=
formation transmitted in this email message is for the intended recipient on=
ly and may contain confidential and/or privileged material. If you received
 this email in error and are not the intended recipient, or the person resp=
onsible to deliver it to the intended recipient, please contact the sender a=
t the email above and delete this email and any attachments and destroy any =
copies thereof. Any review, retransmission,
 dissemination, copying or other use of, or taking any action in reliance u=
pon, this information by persons or entities other than the intended recipie=
nt is strictly prohibited.</span><span style=3D"font-size: 11pt; font-family: =
Calibri, sans-serif;"><u></u><u></u></span></p><p class=3D"MsoNormal"><span st=
yle=3D"font-size: 11pt; font-family: Calibri, sans-serif;">&nbsp;<u></u><u></u=
></span></p></div></div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9p=
t; font-family: Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></p></div><=
div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in =
0in"><p class=3D"MsoNormal"><b><span style=3D"font-size: 11pt; font-family: Cali=
bri, sans-serif;">From:
</span></b><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif;"=
>Svend Vanderveken &lt;</span><a href=3D"mailto:svend.vanderveken@gmail.com" t=
arget=3D"_blank"><span style=3D"font-size: 11pt; font-family: Calibri, sans-seri=
f;">svend.vanderveken@gmail.com</span></a><span style=3D"font-size: 11pt; font=
-family: Calibri, sans-serif;">&gt;<br><b>Reply-To: </b>&lt;</span><a href=3D"=
mailto:user@storm.incubator.apache.org" target=3D"_blank"><span style=3D"font-si=
ze: 11pt; font-family: Calibri, sans-serif;">user@storm.incubator.apache.org=
</span></a><span style=3D"font-size: 11pt; font-family: Calibri, sans-serif;">=
&gt;<br><b>Date: </b>Thursday, January 9, 2014 at 10:46 AM<br><b>To: </b>&lt=
;</span><a href=3D"mailto:user@storm.incubator.apache.org" target=3D"_blank"><sp=
an style=3D"font-size: 11pt; font-family: Calibri, sans-serif;">user@storm.inc=
ubator.apache.org</span></a><span style=3D"font-size: 11pt; font-family: Calib=
ri, sans-serif;">&gt;<br><b>Subject: </b>Re: Strom research suggestions<u></=
u><u></u></span></p></div><div><div><div><p class=3D"MsoNormal"><span style=3D"f=
ont-size: 9pt; font-family: Verdana, sans-serif;"><u></u>&nbsp;<u></u></span=
></p></div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-famil=
y: Verdana, sans-serif;">Hey Tobias,&nbsp;<u></u><u></u></span></p><div><p c=
lass=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verdana, sans-ser=
if;"><u></u>&nbsp;<u></u></span></p></div><div><p class=3D"MsoNormal"><span st=
yle=3D"font-size: 9pt; font-family: Verdana, sans-serif;"><u></u>&nbsp;<u></u>=
</span></p></div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font=
-family: Verdana, sans-serif;">Nice project, I would have loved to play with=
 something like storm back in my university days :)<u></u><u></u></span></p>=
</div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Ve=
rdana, sans-serif;"><u></u>&nbsp;<u></u></span></p></div><div><p class=3D"MsoN=
ormal"><span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;">Here'=
s a topic that's been on my mind for a while (Trident API of storm):<u></u><=
u></u></span></p></div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt=
; font-family: Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></p></div><d=
iv><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verdana, s=
ans-serif;"><u></u>&nbsp;<u></u></span></p></div><div><p class=3D"MsoNormal"><=
span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;">* one core id=
ea of distributed map reduce =E0 la hadoop was to perform as much processing a=
s possible close to the data: you execute the "map" locally on each node whe=
re the
 data sits, you do a first reduce there, then you let the result travel thr=
ough the network, you do one last reduce centrally and you have a result wit=
hout having all your DB travel the network everytime&nbsp;<u></u><u></u></sp=
an></p></div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-fam=
ily: Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></p></div><div><p clas=
s=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;=
">* Storm groupBy + persistentAggregate + reducer/combiner let us have a sim=
ilar semantic, where we map incoming tuples, reduce them with other tuples i=
n the same group + with
 previously reduced value stored in DB at regular interval&nbsp;<u></u><u><=
/u></span></p></div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; f=
ont-family: Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></p></div><div>=
<p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verdana, sans=
-serif;">* for each group, the operation above happens always on the same St=
orm Task (i.e. the same "place" in the cluster) and stores its ongoing state=
 in the "same place" in DB,
 using the group value as primary key&nbsp;<u></u><u></u></span></p></div><=
div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verdana, =
sans-serif;"><u></u>&nbsp;<u></u></span></p></div><div><p class=3D"MsoNormal">=
<span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;">I believe it=
 might be worth investigating if the following pattern would make sense:&nbs=
p;<u></u><u></u></span></p></div><div><p class=3D"MsoNormal"><span style=3D"font=
-size: 9pt; font-family: Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></=
p></div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: =
Verdana, sans-serif;">* install a distributed state store (e..g cassandra) o=
n the same nodes as the Storm workers<u></u><u></u></span></p></div><div><p =
class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verdana, sans-se=
rif;"><u></u>&nbsp;<u></u></span></p></div><div><p class=3D"MsoNormal"><span s=
tyle=3D"font-size: 9pt; font-family: Verdana, sans-serif;">* try to align the =
Storm partitioning triggered by the groupby with Cassandra partitioning, so =
that under usual happy circumstances (no crash), the Storm reduction is happ=
ening
 on the node where Cassandra is storing that particular primary key, avoidi=
ng the network travel for the persistence.&nbsp;<u></u><u></u></span></p></d=
iv><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verda=
na, sans-serif;"><u></u>&nbsp;<u></u></span></p></div><div><p class=3D"MsoNorm=
al"><span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;"><u></u>&=
nbsp;<u></u></span></p></div><div><p class=3D"MsoNormal"><span style=3D"font-siz=
e: 9pt; font-family: Verdana, sans-serif;">What do you think? Premature opti=
mization? Does not make sense? Great idea? Let me know :)<u></u><u></u></spa=
n></p></div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-fami=
ly: Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></p></div><div><p class=
=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;"=
><u></u>&nbsp;<u></u></span></p></div><div><p class=3D"MsoNormal"><span style=3D=
"font-size: 9pt; font-family: Verdana, sans-serif;">S<u></u><u></u></span></=
p></div><div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family: =
Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></p></div><div><p class=3D"Ms=
oNormal"><span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;"><u>=
</u>&nbsp;<u></u></span></p></div></div><div><p class=3D"MsoNormal" style=3D"mar=
gin-bottom:12.0pt"><span style=3D"font-size: 9pt; font-family: Verdana, sans-s=
erif;"><u></u>&nbsp;<u></u></span></p><div><p class=3D"MsoNormal"><span style=3D=
"font-size: 9pt; font-family: Verdana, sans-serif;">On Thu, Jan 9, 2014 at 3=
:00 PM, Tobias Pazer &lt;</span><a href=3D"mailto:tobiaspazer@gmail.com" targe=
t=3D"_blank"><span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;">t=
obiaspazer@gmail.com</span></a><span style=3D"font-size: 9pt; font-family: Ver=
dana, sans-serif;">&gt;
 wrote:<u></u><u></u></span></p><blockquote style=3D"border:none;border-left:=
solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5=
.0pt;margin-right:0in;margin-bottom:5.0pt"><p><span style=3D"font-size: 9pt; f=
ont-family: Verdana, sans-serif;">Hi all,<u></u><u></u></span></p><p><span s=
tyle=3D"font-size: 9pt; font-family: Verdana, sans-serif;">I have recently sta=
rted writing my master thesis with a focus on storm, as we are planning to i=
mplement the lambda architecture in our university.<u></u><u></u></span></p>=
<p><span style=3D"font-size: 9pt; font-family: Verdana, sans-serif;">As it's s=
till not very clear for me where exactly it's worth to dive into, I was hopi=
ng one of you might have any suggestions.
<u></u><u></u></span></p><p><span style=3D"font-size: 9pt; font-family: Verda=
na, sans-serif;">I was thinking about a benchmark or something else to syste=
matically evaluate and improve the configuration of storm, but I'm not sure =
if this is even worth the time.<u></u><u></u></span></p><p><span style=3D"font=
-size: 9pt; font-family: Verdana, sans-serif;">I think the more experienced =
of you definitely have further ideas!<u></u><u></u></span></p><p><span style=
=3D"font-size: 9pt; font-family: Verdana, sans-serif;">Thanks and regards
<br><span style=3D"color:#888888">Tobias</span><u></u><u></u></span></p></blo=
ckquote></div><p class=3D"MsoNormal"><span style=3D"font-size: 9pt; font-family:=
 Verdana, sans-serif;"><u></u>&nbsp;<u></u></span></p></div></div></div></di=
v></blockquote></div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p></div></di=
v></div></div></div></blockquote></div><br></div></div></div></blockquote></=
div><br></div></span></body></html>

--B_3472636057_2920739--