apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siyuan Hua <siy...@datatorrent.com>
Subject Re: Apache Beam Integration
Date Mon, 09 May 2016 20:08:32 GMT
Hey Ilya,

Since I'm working on java High-level API, I also looked at Apache Beam.
Some questions are asked like is high-level API replaceable by Apache Beam
or can we just follow the Apache Beam API that based on Google Dataflow
Model. Well here is something I found:

1. Beam provides whole bunch of classes to define DAG and options of how to
run it. There is no easy way to extend their DAG API or implement them on
your own.

2. The way to use Beam API is use whatever they have to construct a dag,
get the graph data structure and convert it to Apex DAG and run it with our
engine. Beam follows visitor design pattern which is similar to ASM. Here
are 2 core parts to run Beam application in Apex. One is pipeline which is
Dag structure in Beam
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java
And the other is the Visitor interface which defines callback functions
when you visit each node in the dag. Here is an example of Flink
translator(visitor)
https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingTransformTranslators.java

3. Although I think dataflow model is a very good and complete model for
stream process to follow, I don't the Beam API is very declarative and
expressive. I still suggest we build a whole bunch of API that could
deliver same features in dataflow model but more Stream(java stream) like
and SQL like.

In summary, I think the integration is just rum Beam dag with different
engine(storem, flink, spark or apex). But if you want to mingle Beam API
with other ones, it is not very easy.

And also I think we need to work on is not only translation but also
implement some operators that provide the missing features in dataflow
model. And those operators can also be used in high-level API.

Regards,
Siyuan

On Mon, May 9, 2016 at 11:51 AM, Thomas Weise <thomas@datatorrent.com>
wrote:

> Hi Ilya,
>
> Absolutely, this has been discussed in is "on the roadmap".  A quick search
> reveals that a JIRA was already created for it:
> https://issues.apache.org/
> jira/browse/BEAM-261
>
> We are currently discussing the windowing semantics in the context of high
> level stream API, perhaps Siyuan can post his notes here?
>
> Thanks,
> Thomas
>
>
> On Mon, May 9, 2016 at 11:25 AM, Ganelin, Ilya <
> Ilya.Ganelin@capitalone.com>
> wrote:
>
> > Hello, all – Google has just published a new blog announcing the first
> > complete integration of an open source project (Apache Flink) with Apache
> > Beam:
> >
> >
> >
> https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
> >
> > http://data-artisans.com/why-apache-beam/
> >
> > Apache Beam is a unifying framework that allows users to leverage
> > disparate streaming computational frameworks such as Storm, DataFlow,
> > Flink, or Spark using a single API. This integration demands that the
> > framework conform to the Beam programming model:
> > http://vldb.org/pvldb/vol8/p1792-Akidau.pdf, and provides the
> appropriate
> > APIs.
> >
> > While cumbersome, the benefit of integrating with Beam is tremendous.
> > Since there is no single framework that solves all streaming problems for
> > all use cases, the ability to combine frameworks at-will makes developing
> > end-state applications much more straightforward. I believe that many
> > projects will choose to leverage Apache Beam to take advantage of this
> and
> > if Apex does not provide support Beam, it will fall behind, replaced by
> > those frameworks that fit the easy-to-use model of Beam.
> >
> > If we become early adopters, we have a unique opportunity to become part
> > of what will quite possible become a very large community of users and to
> > capitalize on the inherent name recognition of Google to elevate the Apex
> > project and expose it to many who would otherwise not be aware of it.
> >
> > I think integration with Beam can pair with the recent work on developing
> > a high-level API for Apex and is a natural evolution towards making Apex
> > more accessible and more usable by a broader technical community.
> >
> > If there is compelling interest around making this effort a reality, I
> > would love to get this conversation started and work on translating this
> > into a concrete plan of action.
> >
> >
> > ________________________________________________________
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message