incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shailendra Mishra <shailend...@gmail.com>
Subject Re: Modeling a pipeline
Date Mon, 16 Jul 2012 23:20:46 GMT
Absolutely. - Shailendra

On Mon, Jul 16, 2012 at 4:13 PM, Andrea Reale <andrea.reale@unibo.it> wrote:
> Shailendra,
> thanks again for your comments.
>
> Yes, I understand what you are saying, but I guess that in such a case a system restart
(including the re-definition of the cluster, now with five nodes, and the re-deployment of
the application) would allow  to leverage the fifth node as well. Of course one would rather
like not having to restart the system, but at least you don't have to change and rebuild any
code.
>
> Regards,
> Andrea
>
> Shailendra Mishra <shailendrah@gmail.com> wrote:
>
>
> Not really, let's assume that you have 30 PE's each executing one
> transform. Now you decided to add a 5th physical host which implies
> that you will have to start a cluster on this host. If you do that
> then even though you deploy a PE, it won't be recognized by the
> upstream PE for the simple reason that the input stream to the PE was
> not registered with the Zookeeper when the upstream app (whatever it
> may be adaptor or a partitioner etc) was deployed. We need to do more
> work for that to happen.
> - Shailendra
>
> On Mon, Jul 16, 2012 at 3:03 PM, Andrea Reale <andrea.reale@unibo.it> wrote:
>> Hi Shailendra,
>>
>> thank you very much for your help. I am exploring S4 for the first time
>> these days and I am not always sure that I am using the right approach.
>>
>> About your observation: you're totally right, I could aggregate multiple
>> transformations in a single PE and that would certainly reduce the
>> amount of communication. Indeed, I thank that ideally the best approach
>> is to aggregate transformations so that I have just one PE per CPU.
>> On the other hand, maybe reducing the number of PEs reduces the
>> possibility of the system  to "automagically" scale when I add nodes to
>> the cluster. For example, if I model the pipeline with 4 PEs and I have
>> four machines (with 1 processor each) I am OK; but, if I add one node,
>> then I should modify my application in order to use the computing power
>> of the fifth node. Am I wrong?
>>
>> Secondly, I have "made up" this scenario just to be able to experiment
>> with the system and measure its ability to "scale up", hence I need a
>> scenario that is simple and flexible.
>> Any possible suggestion in this direction will be very appreciated!
>>
>> Thanks once again!
>>
>> Regards,
>> Andrea
>>
>> On Mon, 2012-07-16 at 13:20 -0700, Shailendra Mishra wrote:
>>> The approach looks OK and since I don't know the nature of the
>>> transforms hence the question:
>>> Wouldn't you like to put multiple transforms (pick a no. like say 5),
>>> so as to decrease the communication complexity ? or the transforms so
>>> compute intensive that you have to allocate a PE for each". -
>>> Shailendra
>>>
>>> On Mon, Jul 16, 2012 at 12:02 PM, Andrea Reale <andrea.reale@unibo.it>
wrote:
>>> > Hello everyone,
>>> >
>>> > I am trying to run some tests on piper, running in in a cluster of four
>>> > node in a particular image processing scenario. I am writing this
>>> > message to possibly gather opinions about the way I am currently my
>>> > problem, and whether there is some better way to do it.
>>> >
>>> >
>>> > My scenario is the following:
>>> > I have a series of OpenCV transformations that can be modeled as a
>>> > pipeline of 30 stages (at each stage one transformation is applied).
>>> > My modeling approach is to model each stage as a ProcessingElement, so
>>> > that the graph is basically something like:
>>> >
>>> > .-------.     .-----.                .--- --.       .-----.
>>> > | Adpt. |-c0->| St1 |-c1-> ... -c29->| St30 | -c30->| Snk |
>>> > '-------'     '-----'                '------'       '-----'
>>> >
>>> > Where:
>>> > -Adpt. is an adapter that reads image samples and creates corresponding
>>> > S4 events
>>> > - St1-30 are the OpenCVPEs
>>> > - Snk is a PE that stores the results
>>> > - c0-c30 are 30 different streams
>>> >
>>> > Since in this case I do not need events to be processed by keys, I
>>> > assign to every stream an instance of a KeyFinder which returns a
>>> > constant; in other words KeyFinder#get() returns the same value
>>> > independently on the particular event (practically, it returns the name
>>> > of the stream).
>>> >
>>> >
>>> > While I already made some experiments with this approach (and it appears
>>> > to work), I would like to know whether you believe there is a better
>>> > approach for modeling my problem in terms of S4 PEs / streams.
>>> >
>>> > Thanks a lot!
>>> > Andrea
>>> >
>>> >
>>> >
>>> > LA RICERCA C’È E SI VEDE:
>>> > 5 per mille all'Università di Bologna - C.F.: 80007010376
>>> > http://www.unibo.it/5permille
>>> >
>>> > Questa informativa è inserita in automatico dal sistema al fine esclusivo
della realizzazione dei fini istituzionali dell’ente.
>>
>>
>>
>> LA RICERCA C’È E SI VEDE:
>> 5 per mille all'Università di Bologna - C.F.: 80007010376
>> http://www.unibo.it/5permille
>>
>> Questa informativa è inserita in automatico dal sistema al fine esclusivo della
realizzazione dei fini istituzionali dell’ente.
>
> LA RICERCA C’È E SI VEDE:
> 5 per mille all'Università di Bologna - C.F.: 80007010376
> http://www.unibo.it/5permille
>
> Questa informativa è inserita in automatico dal sistema al fine esclusivo della realizzazione
dei fini istituzionali dell’ente.

Mime
View raw message