Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@storm.incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of klaus.schaefers@gmail.com
 designates 209.85.212.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAD9ohx_vKikD4cPB_hqF8MvRoagR+orX_3zsA4tv0gkWQvZ_RA@mail.gmail.com>
References: 
 <CANW+zHrL3hspzNux=L-AUSUq++nMLJ99tjtGTYNCAnQJiT+9Ww@mail.gmail.com>
	<CANW+zHpUDmR9p6RshRfbg0k3coiozPLJtRsbNJ21yzGTANR2Vg@mail.gmail.com>
	<CAAYLz+rz3rNp+VObu7KzvW97uR840asX-cczfQZZJS0KfPxp7w@mail.gmail.com>
	<CEF42FA5.8C865%bone@alumni.brown.edu>
	<CAD9ohx_vKikD4cPB_hqF8MvRoagR+orX_3zsA4tv0gkWQvZ_RA@mail.gmail.com>
Date: Thu, 9 Jan 2014 17:41:33 +0100
Message-ID: 
 <CADv9Q=cSA1DPxNM9-_xpa2KbTdjqwUM_W=cgzTOM3TochcUwAQ@mail.gmail.com>
Subject: Re: Strom research suggestions
From: Klausen Schaefersinho <klaus.schaefers@gmail.com>
To: user@storm.incubator.apache.org
Content-Type: multipart/alternative; boundary=001a11c228a23a8ece04ef8c4850

--001a11c228a23a8ece04ef8c4850
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi,

>* install a distributed state store (e..g cassandra) on the same nodes as
the Storm workers
>
>* try to align the Storm partitioning triggered by the groupby with
Cassandra partitioning, so that under usual happy circumstances (no crash),
the Storm reduction is happening on the node where Cassandra is storing
that particular primary >key, avoiding the network travel for the
persistence.

I think this would be great feature of storm. Have a reliably and fast way
to store state would be great!

Cheers,

Klaus


On Thu, Jan 9, 2014 at 5:11 PM, Adam Lewis <mail@adamlewis.com> wrote:

> I love it; even if it is a premature optimization the beauty of academic
> work is that this should be measurable and is still an interesting findin=
g
> either way.  I don't have the large scale production experience with stor=
m
> that others here have (yet), but it sounds like it would really help
> performance since you're going after network transfer.  And as you say,
> Svend, all the ingredients are already built in to trident.
>
> Adam
>
>
> On Thu, Jan 9, 2014 at 10:56 AM, Brian O'Neill <bone@alumni.brown.edu>wro=
te:
>
>>
>> +1, love the idea.  I=92ve wanted to play with partitioning alignment
>> myself (with C*), but i=92ve been too busy with the day job. =3D)
>>
>> Tobias, if you need some support =97 don=92t hesitate to reach out.
>>
>> If you are able to align the partitioning, and we can add =93in-place=94
>> computation within Storm, it would be great to see a speed comparison
>> between Hadoop and Storm.   (If comparable, it may drive people to aband=
on
>> their Hadoop infrastructure for batch processing, and run everything on
>> Storm)
>>
>> -brian
>>
>> ---
>>
>> Brian O'Neill
>>
>> Chief Architect
>>
>> *Health Market Science*
>>
>> *The Science of Better Results*
>>
>> 2700 Horizon Drive =95 King of Prussia, PA =95 19406
>>
>> M: 215.588.6024 =95 @boneill42 <http://www.twitter.com/boneill42>  =95
>>
>> healthmarketscience.com
>>
>>
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material. =
If
>> you received this email in error and are not the intended recipient, or =
the
>> person responsible to deliver it to the intended recipient, please conta=
ct
>> the sender at the email above and delete this email and any attachments =
and
>> destroy any copies thereof. Any review, retransmission, dissemination,
>> copying or other use of, or taking any action in reliance upon, this
>> information by persons or entities other than the intended recipient is
>> strictly prohibited.
>>
>>
>>
>>
>> From: Svend Vanderveken <svend.vanderveken@gmail.com>
>> Reply-To: <user@storm.incubator.apache.org>
>> Date: Thursday, January 9, 2014 at 10:46 AM
>> To: <user@storm.incubator.apache.org>
>> Subject: Re: Strom research suggestions
>>
>> Hey Tobias,
>>
>>
>> Nice project, I would have loved to play with something like storm back
>> in my university days :)
>>
>> Here's a topic that's been on my mind for a while (Trident API of storm)=
:
>>
>>
>> * one core idea of distributed map reduce =E0 la hadoop was to perform a=
s
>> much processing as possible close to the data: you execute the "map"
>> locally on each node where the data sits, you do a first reduce there, t=
hen
>> you let the result travel through the network, you do one last reduce
>> centrally and you have a result without having all your DB travel the
>> network everytime
>>
>> * Storm groupBy + persistentAggregate + reducer/combiner let us have a
>> similar semantic, where we map incoming tuples, reduce them with other
>> tuples in the same group + with previously reduced value stored in DB at
>> regular interval
>>
>> * for each group, the operation above happens always on the same Storm
>> Task (i.e. the same "place" in the cluster) and stores its ongoing state=
 in
>> the "same place" in DB, using the group value as primary key
>>
>> I believe it might be worth investigating if the following pattern would
>> make sense:
>>
>> * install a distributed state store (e..g cassandra) on the same nodes a=
s
>> the Storm workers
>>
>> * try to align the Storm partitioning triggered by the groupby with
>> Cassandra partitioning, so that under usual happy circumstances (no cras=
h),
>> the Storm reduction is happening on the node where Cassandra is storing
>> that particular primary key, avoiding the network travel for the
>> persistence.
>>
>>
>> What do you think? Premature optimization? Does not make sense? Great
>> idea? Let me know :)
>>
>>
>> S
>>
>>
>>
>>
>> On Thu, Jan 9, 2014 at 3:00 PM, Tobias Pazer <tobiaspazer@gmail.com>wrot=
e:
>>
>>> Hi all,
>>>
>>> I have recently started writing my master thesis with a focus on storm,
>>> as we are planning to implement the lambda architecture in our universi=
ty.
>>>
>>> As it's still not very clear for me where exactly it's worth to dive
>>> into, I was hoping one of you might have any suggestions.
>>>
>>> I was thinking about a benchmark or something else to systematically
>>> evaluate and improve the configuration of storm, but I'm not sure if th=
is
>>> is even worth the time.
>>>
>>> I think the more experienced of you definitely have further ideas!
>>>
>>> Thanks and regards
>>> Tobias
>>>
>>
>>
>

--001a11c228a23a8ece04ef8c4850
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div><br></div><div><div style=3D"font-family:Verdana,s=
ans-serif;font-size:12px">&gt;* install a distributed state store (e..g cas=
sandra) on the same nodes as the Storm workers</div><div style=3D"font-fami=
ly:Verdana,sans-serif;font-size:12px">
&gt;</div><div style=3D"font-family:Verdana,sans-serif;font-size:12px">&gt;=
* try to align the Storm partitioning triggered by the groupby with Cassand=
ra partitioning, so that under usual happy circumstances (no crash), the St=
orm reduction is happening on the node where Cassandra is storing that part=
icular primary &gt;key, avoiding the network travel for the persistence.=A0=
</div>
</div><div style=3D"font-family:Verdana,sans-serif;font-size:12px"><br></di=
v><div style=3D"font-family:Verdana,sans-serif;font-size:12px">I think this=
 would be great feature of storm. Have a reliably and fast way to store sta=
te would be great!<br>
</div><div style=3D"font-family:Verdana,sans-serif;font-size:12px"><br></di=
v><div style=3D"font-family:Verdana,sans-serif;font-size:12px">Cheers,</div=
><div style=3D"font-family:Verdana,sans-serif;font-size:12px"><br></div><di=
v style=3D"font-family:Verdana,sans-serif;font-size:12px">
Klaus</div><div style=3D"font-family:Verdana,sans-serif;font-size:12px"><br=
></div><div style=3D"font-family:Verdana,sans-serif;font-size:12px"><br></d=
iv></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On T=
hu, Jan 9, 2014 at 5:11 PM, Adam Lewis <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:mail@adamlewis.com" target=3D"_blank">mail@adamlewis.com</a>&gt;</span>=
 wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_default=
" style=3D"font-family:arial,helvetica,sans-serif">I love it; even if it is=
 a premature optimization the beauty of academic work is that this should b=
e measurable and is still an interesting finding either way. =A0I don&#39;t=
 have the large scale production experience with storm that others here hav=
e (yet), but it sounds like it would really help performance since you&#39;=
re going after network transfer. =A0And as you say, Svend, all the ingredie=
nts are already built in to trident.</div>


<div class=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-seri=
f"><br></div><div class=3D"gmail_default" style=3D"font-family:arial,helvet=
ica,sans-serif">Adam</div></div><div class=3D"gmail_extra"><br><br><div cla=
ss=3D"gmail_quote">


On Thu, Jan 9, 2014 at 10:56 AM, Brian O&#39;Neill <span dir=3D"ltr">&lt;<a=
 href=3D"mailto:bone@alumni.brown.edu" target=3D"_blank">bone@alumni.brown.=
edu</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div style=3D"font-size:12px;font-family:Verdana,sans-serif;word-wrap:break=
-word"><div><div><br></div><div>+1, love the idea. =A0I=92ve wanted to play=
 with partitioning alignment myself (with C*), but i=92ve been too busy wit=
h the day job. =3D)</div>


<div><br></div><div>Tobias, if you need some support =97 don=92t hesitate t=
o reach out.</div><div><br></div><div>If you are able to align the partitio=
ning, and we can add =93in-place=94 computation within Storm, it would be g=
reat to see a speed comparison between Hadoop and Storm. =A0 (If comparable=
, it may drive people to abandon their Hadoop infrastructure for batch proc=
essing, and run everything on Storm)</div>


<div><br></div><div>-brian</div><div><br></div><div><p class=3D"MsoNormal" =
style=3D"font-family:Calibri,sans-serif;font-size:14px;margin:0in 0in 0.000=
1pt"><font face=3D"Verdana" style=3D"font-size:12px">---</font></p><p class=
=3D"MsoNormal" style=3D"font-family:Calibri,sans-serif;font-size:14px;margi=
n:0in 0in 0.0001pt">


<span style=3D"color:rgb(230,29,53);font-size:12px"><font face=3D"Verdana">=
Brian O&#39;Neill</font></span></p><p class=3D"MsoNormal" style=3D"font-fam=
ily:Calibri,sans-serif;font-size:14px;margin:0in 0in 0.0001pt"><font face=
=3D"Verdana" style=3D"font-size:12px">Chief Architect</font></p>


<p class=3D"MsoNormal" style=3D"font-family:Calibri,sans-serif;font-size:14=
px;margin:0in 0in 0.0001pt"><b><font face=3D"Verdana" style=3D"font-size:12=
px">Health Market=A0<span style=3D"color:rgb(230,29,53)">Science<u></u><u><=
/u></span></font></b></p>


<p class=3D"MsoNormal" style=3D"font-family:Calibri,sans-serif;font-size:14=
px;margin:0in 0in 0.0001pt"><i><span style=3D"font-size:12px"><font face=3D=
"Verdana">The Science of Better Results<u></u><u></u></font></span></i></p>=
<p class=3D"MsoNormal" style=3D"font-family:Calibri,sans-serif;font-size:14=
px;margin:0in 0in 0.0001pt">


<font face=3D"Verdana" style=3D"font-size:12px">2700 Horizon Drive=A0<span =
style=3D"color:rgb(237,26,52);letter-spacing:-0.4pt">=95</span><span style=
=3D"letter-spacing:-0.4pt">=A0</span>King of Prussia, PA=A0<span style=3D"c=
olor:rgb(237,26,52);letter-spacing:-0.4pt">=95</span><span style=3D"letter-=
spacing:-0.4pt">=A0</span>19406<u></u><u></u></font></p>


<p class=3D"MsoNormal" style=3D"font-family:Calibri,sans-serif;font-size:14=
px;margin:0in 0in 0.0001pt"><font face=3D"Verdana" style=3D"font-size:12px"=
>M: <a href=3D"tel:215.588.6024" value=3D"+12155886024" target=3D"_blank">2=
15.588.6024</a>=A0<span style=3D"color:rgb(237,26,52)">=95 </span><span sty=
le=3D"letter-spacing:-0.4pt"><a href=3D"http://www.twitter.com/boneill42" s=
tyle=3D"line-height:20px" target=3D"_blank"><font color=3D"#000000">@boneil=
l42</font></a>=A0=A0</span><span style=3D"color:rgb(237,26,52);letter-spaci=
ng:-0.4pt">=95</span><span style=3D"letter-spacing:-0.4pt">=A0=A0</span></f=
ont></p>


<p class=3D"MsoNormal" style=3D"font-family:Calibri,sans-serif;font-size:14=
px;margin:0in 0in 0.0001pt"><font face=3D"Verdana" style=3D"font-size:12px"=
>healthmarket<span style=3D"color:rgb(230,29,53)">science</span>.com</font>=
</p><p class=3D"MsoNormal" style=3D"margin:0in 0in 0.0001pt;font-size:11pt;=
font-family:Calibri,sans-serif">


<br></p><p class=3D"MsoNormal" style=3D"margin:0in 0in 0.0001pt;font-size:1=
1pt;font-family:Calibri,sans-serif"></p><p class=3D"MsoNormal" style=3D"mar=
gin:0in 0in 0.0001pt;font-size:11pt;font-family:Calibri,sans-serif"><span s=
tyle=3D"font-size:8pt;color:rgb(31,73,125)">This information transmitted in=
 this email message is for the intended recipient only and may contain conf=
idential and/or privileged material. If you received this email in error an=
d are not the intended recipient, or the person responsible to deliver it t=
o the intended recipient, please contact the sender at the email above and =
delete this email and any attachments and destroy any copies thereof. Any r=
eview, retransmission, dissemination, copying or other use of, or taking an=
y action in reliance upon, this information by persons or entities other th=
an the intended recipient is strictly prohibited.<u></u><u></u></span></p>


<p class=3D"MsoNormal" style=3D"margin:0in 0in 0.0001pt;font-size:11pt;font=
-family:Calibri,sans-serif"><u></u>=A0<u></u></p><p style=3D"font-family:Ca=
libri,sans-serif;font-size:14px"></p></div></div><div><br></div><span><div =
style=3D"border-right:medium none;padding-right:0in;padding-left:0in;paddin=
g-top:3pt;text-align:left;font-size:11pt;border-bottom:medium none;font-fam=
ily:Calibri;border-top:#b5c4df 1pt solid;padding-bottom:0in;border-left:med=
ium none">


<span style=3D"font-weight:bold">From: </span> Svend Vanderveken &lt;<a hre=
f=3D"mailto:svend.vanderveken@gmail.com" target=3D"_blank">svend.vanderveke=
n@gmail.com</a>&gt;<br><span style=3D"font-weight:bold">Reply-To: </span> &=
lt;<a href=3D"mailto:user@storm.incubator.apache.org" target=3D"_blank">use=
r@storm.incubator.apache.org</a>&gt;<br>


<span style=3D"font-weight:bold">Date: </span> Thursday, January 9, 2014 at=
 10:46 AM<br><span style=3D"font-weight:bold">To: </span> &lt;<a href=3D"ma=
ilto:user@storm.incubator.apache.org" target=3D"_blank">user@storm.incubato=
r.apache.org</a>&gt;<br>


<span style=3D"font-weight:bold">Subject: </span> Re: Strom research sugges=
tions<br></div><div><div><div><br></div><div dir=3D"ltr">Hey Tobias,=A0<div=
><br></div><div><br></div><div>Nice project, I would have loved to play wit=
h something like storm back in my university days :)</div>


<div><br></div><div>Here&#39;s a topic that&#39;s been on my mind for a whi=
le (Trident API of storm):</div><div><br></div><div><br></div><div>* one co=
re idea of distributed map reduce =E0 la hadoop was to perform as much proc=
essing as possible close to the data: you execute the &quot;map&quot; local=
ly on each node where the data sits, you do a first reduce there, then you =
let the result travel through the network, you do one last reduce centrally=
 and you have a result without having all your DB travel the network everyt=
ime=A0</div>


<div><br></div><div>* Storm groupBy + persistentAggregate + reducer/combine=
r let us have a similar semantic, where we map incoming tuples, reduce them=
 with other tuples in the same group + with previously reduced value stored=
 in DB at regular interval=A0</div>


<div><br></div><div>* for each group, the operation above happens always on=
 the same Storm Task (i.e. the same &quot;place&quot; in the cluster) and s=
tores its ongoing state in the &quot;same place&quot; in DB, using the grou=
p value as primary key=A0</div>


<div><br></div><div>I believe it might be worth investigating if the follow=
ing pattern would make sense:=A0</div><div><br></div><div>* install a distr=
ibuted state store (e..g cassandra) on the same nodes as the Storm workers<=
/div>


<div><br></div><div>* try to align the Storm partitioning triggered by the =
groupby with Cassandra partitioning, so that under usual happy circumstance=
s (no crash), the Storm reduction is happening on the node where Cassandra =
is storing that particular primary key, avoiding the network travel for the=
 persistence.=A0</div>


<div><br></div><div><br></div><div>What do you think? Premature optimizatio=
n? Does not make sense? Great idea? Let me know :)</div><div><br></div><div=
><br></div><div>S</div><div><br></div><div><br></div></div><div class=3D"gm=
ail_extra">


<br><br><div class=3D"gmail_quote">On Thu, Jan 9, 2014 at 3:00 PM, Tobias P=
azer <span dir=3D"ltr">&lt;<a href=3D"mailto:tobiaspazer@gmail.com" target=
=3D"_blank">tobiaspazer@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex">


<p dir=3D"ltr">Hi all,</p><p dir=3D"ltr">I have recently started writing my=
 master thesis with a focus on storm, as we are planning to implement the l=
ambda architecture in our university.</p><p dir=3D"ltr">As it&#39;s still n=
ot very clear for me where exactly it&#39;s worth to dive into, I was hopin=
g one of you might have any suggestions. </p>


<p dir=3D"ltr">I was thinking about a benchmark or something else to system=
atically evaluate and improve the configuration of storm, but I&#39;m not s=
ure if this is even worth the time.</p><p dir=3D"ltr">I think the more expe=
rienced of you definitely have further ideas!</p>


<p dir=3D"ltr">Thanks and regards <br><span><font color=3D"#888888">
Tobias</font></span></p></blockquote></div><br></div></div></div></span></d=
iv>
</blockquote></div><br></div>
</blockquote></div><br></div>

--001a11c228a23a8ece04ef8c4850--