beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stadin, Benjamin" <>
Subject Re: Force pipe executions to run on same node
Date Sun, 22 May 2016 21:01:47 GMT
Hi JB,

None so far. I¹m still thinking about how to achieve what I want to do,
and whether Beam makes sense for my usage scenario.

I¹m mostly interested to just orchestrate tasks to individual machines and
service endpoints, depending on their workload. My application is not so
much about Big Data and parallelism, but local data processing and local

An example scenario:
- A user uploads a set of CAD files
- data from CAD files are extracted in parallel
- a whole bunch of native tools operate on this extracted data set in an
own pipe. Due to the amount of data generated and consumed, it doesn¹t
make sense at all to distribute these tasks to other machines. It¹s very
IO bound. 
- For the same reason, it doesn¹t make sense to distribute data using RDD.
It¹s rather favorable to do only some tasks (such as CAD data extraction)
in parallel, otherwise run other data tasks as a group on a single node,
in order to avoid IO bottle necks.

So I don¹t have a typical Big Data processing in mind. What I¹m looking
for is rather an integrated environment to provide only some kind of
parallel task execution, and task management and administration, as well
as a message bus and event system.

Is Beam a choice for such rather non-Big-Data scenario?


Am 21.05.16, 18:59 schrieb "Jean-Baptiste Onofré" unter <>:

>Hi Ben,
>it's not SDK related, it's more depend on the runner.
>What runner are you using ?
>On 05/21/2016 04:22 PM, Stadin, Benjamin wrote:
>> Hi,
>> I need to control beam pipes/filters so that pipe executions that match
>> a certain criteria are executed on the same node.
>> In Spring XD this can be controlled by defining groups
>> and then specify deployment criteria to match this group.
>> Is this possible with Beam?
>> Best
>> Ben
>Jean-Baptiste Onofré
>Talend -

View raw message