apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <tho...@datatorrent.com>
Subject Re: Apache Beam Integration
Date Thu, 12 May 2016 16:31:07 GMT
Created proxy JIRA:

https://issues.apache.org/jira/browse/APEXMALHAR-2089


On Wed, May 11, 2016 at 1:31 PM, Thomas Weise <thomas@datatorrent.com>
wrote:

> SQL -> Beam is a longer term prospect that Julian Hyde is looking at. At
> this time, I see separate translations for SQL and Beam to the Apex DAG
> representation.
>
> Thanks,
> Thomas
>
> --
> sent from mobile
> On May 11, 2016 1:26 PM, "Bhat, Vijay (CONT)" <Vijay.Bhat@capitalone.com>
> wrote:
>
> I think it's a great idea as well, and could play well with the Calcite /
> Streaming SQL discussion that’s also been going on. Brennon and I talked
> about this and we could envision something like Streaming SQL -> Beam
> representation -> Apex DAG, which will also buy us the trigger / watermark
> capabilities of the Beam model.
>
> On 5/11/16, 9:59 AM, "York, Brennon" <Brennon.York@capitalone.com> wrote:
>
> >+1 to add beam integration. This would be huge for the Apex community and
> >makes it that much easier for developers to come in and begin leveraging
> >the power of Apex.
> >
> >On 5/9/16, 11:44 PM, "Thomas Weise" <thomas@datatorrent.com> wrote:
> >
> >>I spoke to Davor from the Beam team about this today at the Apache Big
> >>Data.
> >>
> >>In the bigger picture multiple DSLs and language specific SDKs are
> >>translated into a language independent representation, which then is
> >>translated by the runner to the execution engine. It seems possible to
> >>pass
> >>hints or annotations that can be accessed at the runner level and used
> >>for
> >>optimizations. There is also the notion of hierarchical constructs
> >>similar
> >>to our modules.
> >>
> >>I have also contacted the Beam folks for a follow-up on how we can
> >>collaborate on this.
> >>
> >>Thanks,
> >>Thomas
> >>
> >>
> >>On Mon, May 9, 2016 at 1:08 PM, Siyuan Hua <siyuan@datatorrent.com>
> >>wrote:
> >>
> >>> Hey Ilya,
> >>>
> >>> Since I'm working on java High-level API, I also looked at Apache Beam.
> >>> Some questions are asked like is high-level API replaceable by Apache
> >>>Beam
> >>> or can we just follow the Apache Beam API that based on Google Dataflow
> >>> Model. Well here is something I found:
> >>>
> >>> 1. Beam provides whole bunch of classes to define DAG and options of
> >>>how to
> >>> run it. There is no easy way to extend their DAG API or implement them
> >>>on
> >>> your own.
> >>>
> >>> 2. The way to use Beam API is use whatever they have to construct a
> >>>dag,
> >>> get the graph data structure and convert it to Apex DAG and run it with
> >>>our
> >>> engine. Beam follows visitor design pattern which is similar to ASM.
> >>>Here
> >>> are 2 core parts to run Beam application in Apex. One is pipeline which
> >>>is
> >>> Dag structure in Beam
> >>>
> >>>
> >>>
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/
> >>>m
> >>>ain/java/org/apache/beam/sdk/Pipeline.java
> >>> And the other is the Visitor interface which defines callback functions
> >>> when you visit each node in the dag. Here is an example of Flink
> >>> translator(visitor)
> >>>
> >>>
> >>>
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runne
> >>>r
> >>>/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingT
> >>>r
> >>>ansformTranslators.java
> >>>
> >>> 3. Although I think dataflow model is a very good and complete model
> >>>for
> >>> stream process to follow, I don't the Beam API is very declarative and
> >>> expressive. I still suggest we build a whole bunch of API that could
> >>> deliver same features in dataflow model but more Stream(java stream)
> >>>like
> >>> and SQL like.
> >>>
> >>> In summary, I think the integration is just rum Beam dag with different
> >>> engine(storem, flink, spark or apex). But if you want to mingle Beam
> >>>API
> >>> with other ones, it is not very easy.
> >>>
> >>> And also I think we need to work on is not only translation but also
> >>> implement some operators that provide the missing features in dataflow
> >>> model. And those operators can also be used in high-level API.
> >>>
> >>> Regards,
> >>> Siyuan
> >>>
> >>> On Mon, May 9, 2016 at 11:51 AM, Thomas Weise <thomas@datatorrent.com>
> >>> wrote:
> >>>
> >>> > Hi Ilya,
> >>> >
> >>> > Absolutely, this has been discussed in is "on the roadmap".  A quick
> >>> search
> >>> > reveals that a JIRA was already created for it:
> >>> > https://issues.apache.org/
> >>> > jira/browse/BEAM-261
> >>> >
> >>> > We are currently discussing the windowing semantics in the context
of
> >>> high
> >>> > level stream API, perhaps Siyuan can post his notes here?
> >>> >
> >>> > Thanks,
> >>> > Thomas
> >>> >
> >>> >
> >>> > On Mon, May 9, 2016 at 11:25 AM, Ganelin, Ilya <
> >>> > Ilya.Ganelin@capitalone.com>
> >>> > wrote:
> >>> >
> >>> > > Hello, all ­ Google has just published a new blog announcing
the
> >>>first
> >>> > > complete integration of an open source project (Apache Flink)
with
> >>> Apache
> >>> > > Beam:
> >>> > >
> >>> > >
> >>> > >
> >>> >
> >>>
> >>>
> https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-
> >>>p
> >>>erspective
> >>> > >
> >>> > > http://data-artisans.com/why-apache-beam/
> >>> > >
> >>> > > Apache Beam is a unifying framework that allows users to leverage
> >>> > > disparate streaming computational frameworks such as Storm,
> >>>DataFlow,
> >>> > > Flink, or Spark using a single API. This integration demands that
> >>>the
> >>> > > framework conform to the Beam programming model:
> >>> > > http://vldb.org/pvldb/vol8/p1792-Akidau.pdf, and provides the
> >>> > appropriate
> >>> > > APIs.
> >>> > >
> >>> > > While cumbersome, the benefit of integrating with Beam is
> >>>tremendous.
> >>> > > Since there is no single framework that solves all streaming
> >>>problems
> >>> for
> >>> > > all use cases, the ability to combine frameworks at-will makes
> >>> developing
> >>> > > end-state applications much more straightforward. I believe that
> >>>many
> >>> > > projects will choose to leverage Apache Beam to take advantage
of
> >>>this
> >>> > and
> >>> > > if Apex does not provide support Beam, it will fall behind,
> >>>replaced by
> >>> > > those frameworks that fit the easy-to-use model of Beam.
> >>> > >
> >>> > > If we become early adopters, we have a unique opportunity to become
> >>> part
> >>> > > of what will quite possible become a very large community of users
> >>>and
> >>> to
> >>> > > capitalize on the inherent name recognition of Google to elevate
> >>>the
> >>> Apex
> >>> > > project and expose it to many who would otherwise not be aware
of
> >>>it.
> >>> > >
> >>> > > I think integration with Beam can pair with the recent work on
> >>> developing
> >>> > > a high-level API for Apex and is a natural evolution towards making
> >>> Apex
> >>> > > more accessible and more usable by a broader technical community.
> >>> > >
> >>> > > If there is compelling interest around making this effort a
> >>>reality, I
> >>> > > would love to get this conversation started and work on translating
> >>> this
> >>> > > into a concrete plan of action.
> >>> > >
> >>> > >
> >>> > > ________________________________________________________
> >>> > >
> >>> > > The information contained in this e-mail is confidential and/or
> >>> > > proprietary to Capital One and/or its affiliates and may only
be
> >>>used
> >>> > > solely in performance of work or services for Capital One. The
> >>> > information
> >>> > > transmitted herewith is intended only for use by the individual
or
> >>> entity
> >>> > > to which it is addressed. If the reader of this message is not
the
> >>> > intended
> >>> > > recipient, you are hereby notified that any review, retransmission,
> >>> > > dissemination, distribution, copying or other use of, or taking
of
> >>>any
> >>> > > action in reliance upon this information is strictly prohibited.
If
> >>>you
> >>> > > have received this communication in error, please contact the
> >>>sender
> >>> and
> >>> > > delete the material from your computer.
> >>> > >
> >>> >
> >>>
> >
> >________________________________________________________
> >
> >The information contained in this e-mail is confidential and/or
> >proprietary to Capital One and/or its affiliates and may only be used
> >solely in performance of work or services for Capital One. The
> >information transmitted herewith is intended only for use by the
> >individual or entity to which it is addressed. If the reader of this
> >message is not the intended recipient, you are hereby notified that any
> >review, retransmission, dissemination, distribution, copying or other use
> >of, or taking of any action in reliance upon this information is strictly
> >prohibited. If you have received this communication in error, please
> >contact the sender and delete the material from your computer.
> >
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message