beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mitar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2026) High performance direct runner
Date Thu, 20 Apr 2017 17:58:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977150#comment-15977150
] 

Mitar commented on BEAM-2026:
-----------------------------

I have not yet done any benchmark, but I would suspect having any extra layer in between would
make it slower, no?

To me one issue is that Spark adds the whole JVM into the mix. But I see that current implementation
of Beam direct runner is also based on JVM.

For me personally it is more about how hard is to start using any of this distributed technologies.
The appeal of Beam to me is that I can for now learn the programming model and start developing
in it, and then later on, if needed, I can scale it by changing the execution runner, and
also at that time learn about all the details how t deploy Spark or Flink and so on. Probably
for somebody who knows how to run and use Spark or Flink it does not matter. But not everyone
does.

In some way I would just prefer to start with programming in Python, but in Beam programming
model, using Python runner. And then if needed scale it.

> High performance direct runner
> ------------------------------
>
>                 Key: BEAM-2026
>                 URL: https://issues.apache.org/jira/browse/BEAM-2026
>             Project: Beam
>          Issue Type: New Feature
>          Components: runner-direct
>            Reporter: Mitar
>            Assignee: Thomas Groh
>
> In documentation (https://beam.apache.org/documentation/runners/direct/) it is written
that direct runner does not try to run efficiently, but it serves mostly for development and
debugging.
> I would suggest that there should be also an efficient direct runner. If Beam tries to
be an unified programming model, for some smaller tasks I would love to implement them in
Beam, just to keep the code in the same model, but it would be OK to run it as a normal smaller
program (maybe inside one Docker container), without any distribution across multiple machines.
In the future, if usage grows, I could then replace underlying runner with something distributed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message