From user-return-432-apmail-storm-user-archive=storm.apache.org@storm.incubator.apache.org Thu Jan 9 15:58:11 2014 Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD6A1107F4 for ; Thu, 9 Jan 2014 15:58:11 +0000 (UTC) Received: (qmail 50834 invoked by uid 500); 9 Jan 2014 15:57:05 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 50784 invoked by uid 500); 9 Jan 2014 15:56:59 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 50703 invoked by uid 99); 9 Jan 2014 15:56:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jan 2014 15:56:50 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of boneill42@gmail.com designates 209.85.216.169 as permitted sender) Received: from [209.85.216.169] (HELO mail-qc0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jan 2014 15:56:43 +0000 Received: by mail-qc0-f169.google.com with SMTP id r5so2820625qcx.14 for ; Thu, 09 Jan 2014 07:56:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:user-agent:date:subject:from:to:message-id:thread-topic :references:in-reply-to:mime-version:content-type; bh=HzrDQ5heeSwKpe31G33SEmT7DE0koYwT7QKQxDjMAUM=; b=uGb7pB9JM5uollGQF5TrkNho8mGWBY5yPcjGfH7YKDlF2095dQMYT0LyhhTTnpfnd5 +zafhK+ekqqQKI5bCmKRgltzKUvTYItHzgwOPL1S7f3V201U9EmLK5ab71NOEHeVPxZz 7oHIZ1Nz3LqXICw9O/uO7V3d5r/KOmosoB3z7j6/j4YVUsjOczrlWXsMppXTVKbsrV7J RT99NiEGh0+eVsb1tCbseV3bs7NQId3yrBfIsGHlC5X9jFVog7vxRKNl/Em2eykmK+na 5YJH7YYbdrKZ61G0H8xO5Ao5b8TrsxHLMEc6oVfcAzczwobTqeHbo+RnDpZQGdWggTcE PMog== X-Received: by 10.49.12.102 with SMTP id x6mr8878792qeb.5.1389282982751; Thu, 09 Jan 2014 07:56:22 -0800 (PST) Received: from [10.60.71.81] ([67.132.206.254]) by mx.google.com with ESMTPSA id c6sm7523223qev.15.2014.01.09.07.56.19 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 09 Jan 2014 07:56:21 -0800 (PST) Sender: "Brian O'Neill" User-Agent: Microsoft-MacOutlook/14.3.9.131030 Date: Thu, 09 Jan 2014 10:56:13 -0500 Subject: Re: Strom research suggestions From: Brian O'Neill To: Message-ID: Thread-Topic: Strom research suggestions References: In-Reply-To: Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3472109780_731711" X-Virus-Checked: Checked by ClamAV on apache.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3472109780_731711 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable +1, love the idea. I=B9ve wanted to play with partitioning alignment myself (with C*), but i=B9ve been too busy with the day job. =3D) Tobias, if you need some support =8B don=B9t hesitate to reach out. If you are able to align the partitioning, and we can add =B3in-place=B2 computation within Storm, it would be great to see a speed comparison between Hadoop and Storm. (If comparable, it may drive people to abandon their Hadoop infrastructure for batch processing, and run everything on Storm) -brian --- Brian O'Neill Chief Architect Health Market Science The Science of Better Results 2700 Horizon Drive =80 King of Prussia, PA =80 19406 M: 215.588.6024 =80 @boneill42 =80 healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. =20 From: Svend Vanderveken Reply-To: Date: Thursday, January 9, 2014 at 10:46 AM To: Subject: Re: Strom research suggestions Hey Tobias,=20 Nice project, I would have loved to play with something like storm back in my university days :) Here's a topic that's been on my mind for a while (Trident API of storm): * one core idea of distributed map reduce =E0 la hadoop was to perform as muc= h processing as possible close to the data: you execute the "map" locally on each node where the data sits, you do a first reduce there, then you let th= e result travel through the network, you do one last reduce centrally and you have a result without having all your DB travel the network everytime * Storm groupBy + persistentAggregate + reducer/combiner let us have a similar semantic, where we map incoming tuples, reduce them with other tuples in the same group + with previously reduced value stored in DB at regular interval=20 * for each group, the operation above happens always on the same Storm Task (i.e. the same "place" in the cluster) and stores its ongoing state in the "same place" in DB, using the group value as primary key I believe it might be worth investigating if the following pattern would make sense:=20 * install a distributed state store (e..g cassandra) on the same nodes as the Storm workers * try to align the Storm partitioning triggered by the groupby with Cassandra partitioning, so that under usual happy circumstances (no crash), the Storm reduction is happening on the node where Cassandra is storing tha= t particular primary key, avoiding the network travel for the persistence. What do you think? Premature optimization? Does not make sense? Great idea? Let me know :) S On Thu, Jan 9, 2014 at 3:00 PM, Tobias Pazer wrote: >=20 > Hi all, >=20 > I have recently started writing my master thesis with a focus on storm, a= s we > are planning to implement the lambda architecture in our university. >=20 > As it's still not very clear for me where exactly it's worth to dive into= , I > was hoping one of you might have any suggestions. >=20 > I was thinking about a benchmark or something else to systematically eval= uate > and improve the configuration of storm, but I'm not sure if this is even = worth > the time. >=20 > I think the more experienced of you definitely have further ideas! >=20 > Thanks and regards > Tobias --B_3472109780_731711 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable

+1, love = the idea.  I’ve wanted to play with partitioning alignment myself= (with C*), but i’ve been too busy with the day job. =3D)

=
Tobias, if you need some support — don’t hesitate to = reach out.

If you are able to align the partitionin= g, and we can add “in-place” computation within Storm, it would = be great to see a speed comparison between Hadoop and Storm.   (If comp= arable, it may drive people to abandon their Hadoop infrastructure for batch= processing, and run everything on Storm)

-brian

---

Brian O'Neill

Chief Architect

= Health Market Science

The Science of Bett= er Results

2700 Hor= izon Drive  King of Prussia, PA = 226; 19406

M: 215.588.6024 @boneill42   &nbs= p;

healthmarketscience.com


= This information tra= nsmitted in this email message is for the intended recipient only and may co= ntain confidential and/or privileged material. If you received this email in= error and are not the intended recipient, or the person responsible to deli= ver it to the intended recipient, please contact the sender at the email abo= ve and delete this email and any attachments and destroy any copies thereof.= Any review, retransmission, dissemination, copying or other use of, or taki= ng any action in reliance upon, this information by persons or entities othe= r than the intended recipient is strictly prohibited.

<= p class=3D"MsoNormal" style=3D"margin: 0in 0in 0.0001pt; font-size: 11pt; font-f= amily: Calibri, sans-serif;"> 


From: Svend Vanderveken <svend.vanderveken@gmail.com>
Reply-To: <user@storm.incubator.apache.org>
Date: Thursday, January 9, 2014 at 10:46 AM
To: <user@storm.incubator.apache.org>
Subject: Re: Strom research suggestions

Hey Tobias, 


Nice = project, I would have loved to play with something like storm back in my uni= versity days :)

Here's a topic that's been on my mi= nd for a while (Trident API of storm):


* one core idea of distributed map reduce =E0 la hadoop was to perform as m= uch processing as possible close to the data: you execute the "map" locally = on each node where the data sits, you do a first reduce there, then you let = the result travel through the network, you do one last reduce centrally and = you have a result without having all your DB travel the network everytime&nb= sp;

* Storm groupBy + persistentAggregate + reducer= /combiner let us have a similar semantic, where we map incoming tuples, redu= ce them with other tuples in the same group + with previously reduced value = stored in DB at regular interval 

* for each g= roup, the operation above happens always on the same Storm Task (i.e. the sa= me "place" in the cluster) and stores its ongoing state in the "same place" = in DB, using the group value as primary key 

I= believe it might be worth investigating if the following pattern would make= sense: 

* install a distributed state store (= e..g cassandra) on the same nodes as the Storm workers

<= div>* try to align the Storm partitioning triggered by the groupby with Cass= andra partitioning, so that under usual happy circumstances (no crash), the = Storm reduction is happening on the node where Cassandra is storing that par= ticular primary key, avoiding the network travel for the persistence. <= /div>


What do you think? Premature optimiz= ation? Does not make sense? Great idea? Let me know :)

<= div>
S




On Thu, Jan 9, 2014 at 3:00 PM, = Tobias Pazer <tobiaspazer@gmail.com> wrote:

Hi all,

I have recently started writin= g my master thesis with a focus on storm, as we are planning to implement th= e lambda architecture in our university.

As it's still not v= ery clear for me where exactly it's worth to dive into, I was hoping one of = you might have any suggestions.

I was thinking about a benc= hmark or something else to systematically evaluate and improve the configura= tion of storm, but I'm not sure if this is even worth the time.

I think the more experienced of you definitely have further ideas!

Thanks and regards
Tobias


--B_3472109780_731711--