apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <tho...@datatorrent.com>
Subject Re: Apache Beam Integration
Date Wed, 11 May 2016 20:31:37 GMT
SQL -> Beam is a longer term prospect that Julian Hyde is looking at. At
this time, I see separate translations for SQL and Beam to the Apex DAG
representation.

Thanks,
Thomas

--
sent from mobile
On May 11, 2016 1:26 PM, "Bhat, Vijay (CONT)" <Vijay.Bhat@capitalone.com>
wrote:

I think it's a great idea as well, and could play well with the Calcite /
Streaming SQL discussion that’s also been going on. Brennon and I talked
about this and we could envision something like Streaming SQL -> Beam
representation -> Apex DAG, which will also buy us the trigger / watermark
capabilities of the Beam model.

On 5/11/16, 9:59 AM, "York, Brennon" <Brennon.York@capitalone.com> wrote:

>+1 to add beam integration. This would be huge for the Apex community and
>makes it that much easier for developers to come in and begin leveraging
>the power of Apex.
>
>On 5/9/16, 11:44 PM, "Thomas Weise" <thomas@datatorrent.com> wrote:
>
>>I spoke to Davor from the Beam team about this today at the Apache Big
>>Data.
>>
>>In the bigger picture multiple DSLs and language specific SDKs are
>>translated into a language independent representation, which then is
>>translated by the runner to the execution engine. It seems possible to
>>pass
>>hints or annotations that can be accessed at the runner level and used
>>for
>>optimizations. There is also the notion of hierarchical constructs
>>similar
>>to our modules.
>>
>>I have also contacted the Beam folks for a follow-up on how we can
>>collaborate on this.
>>
>>Thanks,
>>Thomas
>>
>>
>>On Mon, May 9, 2016 at 1:08 PM, Siyuan Hua <siyuan@datatorrent.com>
>>wrote:
>>
>>> Hey Ilya,
>>>
>>> Since I'm working on java High-level API, I also looked at Apache Beam.
>>> Some questions are asked like is high-level API replaceable by Apache
>>>Beam
>>> or can we just follow the Apache Beam API that based on Google Dataflow
>>> Model. Well here is something I found:
>>>
>>> 1. Beam provides whole bunch of classes to define DAG and options of
>>>how to
>>> run it. There is no easy way to extend their DAG API or implement them
>>>on
>>> your own.
>>>
>>> 2. The way to use Beam API is use whatever they have to construct a
>>>dag,
>>> get the graph data structure and convert it to Apex DAG and run it with
>>>our
>>> engine. Beam follows visitor design pattern which is similar to ASM.
>>>Here
>>> are 2 core parts to run Beam application in Apex. One is pipeline which
>>>is
>>> Dag structure in Beam
>>>
>>>
>>>https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/
>>>m
>>>ain/java/org/apache/beam/sdk/Pipeline.java
>>> And the other is the Visitor interface which defines callback functions
>>> when you visit each node in the dag. Here is an example of Flink
>>> translator(visitor)
>>>
>>>
>>>https://github.com/apache/incubator-beam/blob/master/runners/flink/runne
>>>r
>>>/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingT
>>>r
>>>ansformTranslators.java
>>>
>>> 3. Although I think dataflow model is a very good and complete model
>>>for
>>> stream process to follow, I don't the Beam API is very declarative and
>>> expressive. I still suggest we build a whole bunch of API that could
>>> deliver same features in dataflow model but more Stream(java stream)
>>>like
>>> and SQL like.
>>>
>>> In summary, I think the integration is just rum Beam dag with different
>>> engine(storem, flink, spark or apex). But if you want to mingle Beam
>>>API
>>> with other ones, it is not very easy.
>>>
>>> And also I think we need to work on is not only translation but also
>>> implement some operators that provide the missing features in dataflow
>>> model. And those operators can also be used in high-level API.
>>>
>>> Regards,
>>> Siyuan
>>>
>>> On Mon, May 9, 2016 at 11:51 AM, Thomas Weise <thomas@datatorrent.com>
>>> wrote:
>>>
>>> > Hi Ilya,
>>> >
>>> > Absolutely, this has been discussed in is "on the roadmap".  A quick
>>> search
>>> > reveals that a JIRA was already created for it:
>>> > https://issues.apache.org/
>>> > jira/browse/BEAM-261
>>> >
>>> > We are currently discussing the windowing semantics in the context of
>>> high
>>> > level stream API, perhaps Siyuan can post his notes here?
>>> >
>>> > Thanks,
>>> > Thomas
>>> >
>>> >
>>> > On Mon, May 9, 2016 at 11:25 AM, Ganelin, Ilya <
>>> > Ilya.Ganelin@capitalone.com>
>>> > wrote:
>>> >
>>> > > Hello, all ­ Google has just published a new blog announcing the
>>>first
>>> > > complete integration of an open source project (Apache Flink) with
>>> Apache
>>> > > Beam:
>>> > >
>>> > >
>>> > >
>>> >
>>>
>>>https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-
>>>p
>>>erspective
>>> > >
>>> > > http://data-artisans.com/why-apache-beam/
>>> > >
>>> > > Apache Beam is a unifying framework that allows users to leverage
>>> > > disparate streaming computational frameworks such as Storm,
>>>DataFlow,
>>> > > Flink, or Spark using a single API. This integration demands that
>>>the
>>> > > framework conform to the Beam programming model:
>>> > > http://vldb.org/pvldb/vol8/p1792-Akidau.pdf, and provides the
>>> > appropriate
>>> > > APIs.
>>> > >
>>> > > While cumbersome, the benefit of integrating with Beam is
>>>tremendous.
>>> > > Since there is no single framework that solves all streaming
>>>problems
>>> for
>>> > > all use cases, the ability to combine frameworks at-will makes
>>> developing
>>> > > end-state applications much more straightforward. I believe that
>>>many
>>> > > projects will choose to leverage Apache Beam to take advantage of
>>>this
>>> > and
>>> > > if Apex does not provide support Beam, it will fall behind,
>>>replaced by
>>> > > those frameworks that fit the easy-to-use model of Beam.
>>> > >
>>> > > If we become early adopters, we have a unique opportunity to become
>>> part
>>> > > of what will quite possible become a very large community of users
>>>and
>>> to
>>> > > capitalize on the inherent name recognition of Google to elevate
>>>the
>>> Apex
>>> > > project and expose it to many who would otherwise not be aware of
>>>it.
>>> > >
>>> > > I think integration with Beam can pair with the recent work on
>>> developing
>>> > > a high-level API for Apex and is a natural evolution towards making
>>> Apex
>>> > > more accessible and more usable by a broader technical community.
>>> > >
>>> > > If there is compelling interest around making this effort a
>>>reality, I
>>> > > would love to get this conversation started and work on translating
>>> this
>>> > > into a concrete plan of action.
>>> > >
>>> > >
>>> > > ________________________________________________________
>>> > >
>>> > > The information contained in this e-mail is confidential and/or
>>> > > proprietary to Capital One and/or its affiliates and may only be
>>>used
>>> > > solely in performance of work or services for Capital One. The
>>> > information
>>> > > transmitted herewith is intended only for use by the individual or
>>> entity
>>> > > to which it is addressed. If the reader of this message is not the
>>> > intended
>>> > > recipient, you are hereby notified that any review, retransmission,
>>> > > dissemination, distribution, copying or other use of, or taking of
>>>any
>>> > > action in reliance upon this information is strictly prohibited. If
>>>you
>>> > > have received this communication in error, please contact the
>>>sender
>>> and
>>> > > delete the material from your computer.
>>> > >
>>> >
>>>
>
>________________________________________________________
>
>The information contained in this e-mail is confidential and/or
>proprietary to Capital One and/or its affiliates and may only be used
>solely in performance of work or services for Capital One. The
>information transmitted herewith is intended only for use by the
>individual or entity to which it is addressed. If the reader of this
>message is not the intended recipient, you are hereby notified that any
>review, retransmission, dissemination, distribution, copying or other use
>of, or taking of any action in reliance upon this information is strictly
>prohibited. If you have received this communication in error, please
>contact the sender and delete the material from your computer.
>

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary
to Capital One and/or its affiliates and may only be used solely in
performance of work or services for Capital One. The information
transmitted herewith is intended only for use by the individual or entity
to which it is addressed. If the reader of this message is not the intended
recipient, you are hereby notified that any review, retransmission,
dissemination, distribution, copying or other use of, or taking of any
action in reliance upon this information is strictly prohibited. If you
have received this communication in error, please contact the sender and
delete the material from your computer.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message