beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse Anderson <je...@smokinghand.com>
Subject Re: Force pipe executions to run on same node
Date Mon, 23 May 2016 19:59:38 GMT
Benjamin,

Sorry, the success and failures are a bit too nuanced for an email.

A quick check on average CAD files says they're around 1 MB. That'd be a
poor use of HDFS.

Thanks,

Jesse

On Mon, May 23, 2016 at 11:08 AM Stadin, Benjamin <
Benjamin.Stadin@heidelberg-mobil.com> wrote:

> Hi Jesse,
>
> Yes, this is what I’m looking for. I want to deploy and run the same code,
> mostly written in Python as well as C++, on different nodes. I also want to
> benefit from the job distribution and job monitoring / administration
> capabilities. I only need parallelization to a minor degree later.
>
> Though I’m hesitant to use HDFS, or any other distributed file system.
> Since I process the data only on one node, it will probably be big
> disadvantage for this data to be distributed to other nodes as well via
> HDFS.
>
> Could you maybe share some info about the successful implementations and
> configurations of such distributed job engine?
>
> Thanks
> Ben
>
> Von: Jesse Anderson <jesse@smokinghand.com>
> Antworten an: "user@beam.incubator.apache.org" <
> user@beam.incubator.apache.org>
> Datum: Montag, 23. Mai 2016 um 19:22
> An: "user@beam.incubator.apache.org" <user@beam.incubator.apache.org>
> Betreff: Re: Force pipe executions to run on same node
>
> Benjamin,
>
> I've had a few students using Big Data frameworks as a distributed job
> engine. They work in varying degrees of success.
>
> With Beam, your success will really depend on the runner as JB said. If I
> understand your use case correctly, if you were using Hadoop MapReduce,
> you'd be using a map-only job. Beam would give you the ability to run the
> same code on several different execution engines. If that isn't your goal,
> you might look elsewhere.
>
> Thanks,
>
> Jesse
>
> On Mon, May 23, 2016 at 6:47 AM Jean-Baptiste Onofré <jb@nanthrax.net>
> wrote:
>
>> Hi Benjamin,
>>
>> Your data processing doesn't seem to be fully big data oriented and
>> distributed.
>>
>> Maybe Apache Camel is more appropriate for such scenario. You can always
>> delegate part of the data processing to Beam from Camel (using Kafka
>> topic for instance).
>>
>> Regards
>> JB
>>
>> On 05/22/2016 11:01 PM, Stadin, Benjamin wrote:
>> > Hi JB,
>> >
>> > None so far. I¹m still thinking about how to achieve what I want to do,
>> > and whether Beam makes sense for my usage scenario.
>> >
>> > I¹m mostly interested to just orchestrate tasks to individual machines
>> and
>> > service endpoints, depending on their workload. My application is not so
>> > much about Big Data and parallelism, but local data processing and local
>> > parallelization.
>> >
>> > An example scenario:
>> > - A user uploads a set of CAD files
>> > - data from CAD files are extracted in parallel
>> > - a whole bunch of native tools operate on this extracted data set in an
>> > own pipe. Due to the amount of data generated and consumed, it doesn¹t
>> > make sense at all to distribute these tasks to other machines. It¹s very
>> > IO bound.
>> > - For the same reason, it doesn¹t make sense to distribute data using
>> RDD.
>> > It¹s rather favorable to do only some tasks (such as CAD data
>> extraction)
>> > in parallel, otherwise run other data tasks as a group on a single node,
>> > in order to avoid IO bottle necks.
>> >
>> > So I don¹t have a typical Big Data processing in mind. What I¹m looking
>> > for is rather an integrated environment to provide only some kind of
>> > parallel task execution, and task management and administration, as well
>> > as a message bus and event system.
>> >
>> > Is Beam a choice for such rather non-Big-Data scenario?
>> >
>> > Regards,
>> > Ben
>> >
>> >
>> > Am 21.05.16, 18:59 schrieb "Jean-Baptiste Onofré" unter <
>> jb@nanthrax.net>:
>> >
>> >> Hi Ben,
>> >>
>> >> it's not SDK related, it's more depend on the runner.
>> >>
>> >> What runner are you using ?
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On 05/21/2016 04:22 PM, Stadin, Benjamin wrote:
>> >>> Hi,
>> >>>
>> >>> I need to control beam pipes/filters so that pipe executions that
>> match
>> >>> a certain criteria are executed on the same node.
>> >>>
>> >>> In Spring XD this can be controlled by defining groups
>> >>>
>> >>> (
>> http://docs.spring.io/spring-xd/docs/1.2.0.RELEASE/reference/html/#deplo
>> >>> yment)
>> >>> and then specify deployment criteria to match this group.
>> >>>
>> >>> Is this possible with Beam?
>> >>>
>> >>> Best
>> >>> Ben
>> >>
>> >> --
>> >> Jean-Baptiste Onofré
>> >> jbonofre@apache.org
>> >> http://blog.nanthrax.net
>> >> Talend - http://www.talend.com
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

Mime
View raw message