apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganelin, Ilya" <Ilya.Gane...@capitalone.com>
Subject Re: Apache Beam Integration
Date Fri, 20 May 2016 18:54:36 GMT

Hi, all - I would like to dive into the Beam development effort in earnest and drive this.
There are already a number of complementary components being worked on in parallel:

1) Windowed operators: https://issues.apache.org/jira/browse/APEXMALHAR-2085
2) A higher-level API  https://issues.apache.org/jira/browse/APEXMALHAR-1939
3) What Siyuan highlighted as “next-phase” for the Streaming API (e.g. Watermarks, triggers,
and different windowing semantics)
 
Given the wealth of activity, I would love some help understanding how I could best focus
my energy to get us integrated with Beam as quickly as possible. If we still feel that there’s
design work that needs to happen, either on operator design or on the API, before we can move
forward, I’d be happy to help flesh that out. Alternately, if we could begin implementation
of things like an API for windowed operators or watermarks, I could begin work on that.  

I think the higher-level API can definitely support the Beam work, and we can likely base
our development on that API. However, it also seems that the other requirements for Beam,
specifically watermarks and triggers, could be implemented independent of that effort. The
API effort is complementary but ultimately targets community growth and ease of use for Apex,
rather than Beam support. 

Lastly, to Siyuan’s point below, I think it’s perfectly reasonable to initially target
using the Beam API to create an Apex DAG, rather than worrying about how to convert generic
Apex applications into the Beam language. 

I look forward to hearing your thoughts!




On 5/12/16, 9:31 AM, "Thomas Weise" <thomas@datatorrent.com> wrote:

>Created proxy JIRA:
>
>https://issues.apache.org/jira/browse/APEXMALHAR-2089
>
>
>On Wed, May 11, 2016 at 1:31 PM, Thomas Weise <thomas@datatorrent.com>
>wrote:
>
>> SQL -> Beam is a longer term prospect that Julian Hyde is looking at. At
>> this time, I see separate translations for SQL and Beam to the Apex DAG
>> representation.
>>
>> Thanks,
>> Thomas
>>
>> --
>> sent from mobile
>> On May 11, 2016 1:26 PM, "Bhat, Vijay (CONT)" <Vijay.Bhat@capitalone.com>
>> wrote:
>>
>> I think it's a great idea as well, and could play well with the Calcite /
>> Streaming SQL discussion that’s also been going on. Brennon and I talked
>> about this and we could envision something like Streaming SQL -> Beam
>> representation -> Apex DAG, which will also buy us the trigger / watermark
>> capabilities of the Beam model.
>>
>> On 5/11/16, 9:59 AM, "York, Brennon" <Brennon.York@capitalone.com> wrote:
>>
>> >+1 to add beam integration. This would be huge for the Apex community and
>> >makes it that much easier for developers to come in and begin leveraging
>> >the power of Apex.
>> >
>> >On 5/9/16, 11:44 PM, "Thomas Weise" <thomas@datatorrent.com> wrote:
>> >
>> >>I spoke to Davor from the Beam team about this today at the Apache Big
>> >>Data.
>> >>
>> >>In the bigger picture multiple DSLs and language specific SDKs are
>> >>translated into a language independent representation, which then is
>> >>translated by the runner to the execution engine. It seems possible to
>> >>pass
>> >>hints or annotations that can be accessed at the runner level and used
>> >>for
>> >>optimizations. There is also the notion of hierarchical constructs
>> >>similar
>> >>to our modules.
>> >>
>> >>I have also contacted the Beam folks for a follow-up on how we can
>> >>collaborate on this.
>> >>
>> >>Thanks,
>> >>Thomas
>> >>
>> >>
>> >>On Mon, May 9, 2016 at 1:08 PM, Siyuan Hua <siyuan@datatorrent.com
>> >>wrote:
>> >>
>> >>> Hey Ilya,
>> >>>
>> >>> Since I'm working on java High-level API, I also looked at Apache Beam.
>> >>> Some questions are asked like is high-level API replaceable by Apache
>> >>>Beam
>> >>> or can we just follow the Apache Beam API that based on Google Dataflow
>> >>> Model. Well here is something I found:
>> >>>
>> >>> 1. Beam provides whole bunch of classes to define DAG and options of
>> >>>how to
>> >>> run it. There is no easy way to extend their DAG API or implement them
>> >>>on
>> >>> your own.
>> >>>
>> >>> 2. The way to use Beam API is use whatever they have to construct a
>> >>>dag,
>> >>> get the graph data structure and convert it to Apex DAG and run it with
>> >>>our
>> >>> engine. Beam follows visitor design pattern which is similar to ASM.
>> >>>Here
>> >>> are 2 core parts to run Beam application in Apex. One is pipeline which
>> >>>is
>> >>> Dag structure in Beam
>> >>>
>> >>>
>> >>>
>> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/
>> >>>m
>> >>>ain/java/org/apache/beam/sdk/Pipeline.java
>> >>> And the other is the Visitor interface which defines callback functions
>> >>> when you visit each node in the dag. Here is an example of Flink
>> >>> translator(visitor)
>> >>>
>> >>>
>> >>>
>> https://github.com/apache/incubator-beam/blob/master/runners/flink/runne
>> >>>r
>> >>>/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingT
>> >>>r
>> >>>ansformTranslators.java
>> >>>
>> >>> 3. Although I think dataflow model is a very good and complete model
>> >>>for
>> >>> stream process to follow, I don't the Beam API is very declarative and
>> >>> expressive. I still suggest we build a whole bunch of API that could
>> >>> deliver same features in dataflow model but more Stream(java stream)
>> >>>like
>> >>> and SQL like.
>> >>>
>> >>> In summary, I think the integration is just rum Beam dag with different
>> >>> engine(storem, flink, spark or apex). But if you want to mingle Beam
>> >>>API
>> >>> with other ones, it is not very easy.
>> >>>
>> >>> And also I think we need to work on is not only translation but also
>> >>> implement some operators that provide the missing features in dataflow
>> >>> model. And those operators can also be used in high-level API.
>> >>>
>> >>> Regards,
>> >>> Siyuan
>> >>>
>> >>> On Mon, May 9, 2016 at 11:51 AM, Thomas Weise <thomas@datatorrent.com>
>> >>> wrote:
>> >>>
>> >>> > Hi Ilya,
>> >>> >
>> >>> > Absolutely, this has been discussed in is "on the roadmap".  A
quick
>> >>> search
>> >>> > reveals that a JIRA was already created for it:
>> >>> > https://issues.apache.org/
>> >>> > jira/browse/BEAM-261
>> >>> >
>> >>> > We are currently discussing the windowing semantics in the context
of
>> >>> high
>> >>> > level stream API, perhaps Siyuan can post his notes here?
>> >>> >
>> >>> > Thanks,
>> >>> > Thomas
>> >>> >
>> >>> >
>> >>> > On Mon, May 9, 2016 at 11:25 AM, Ganelin, Ilya <
>> >>> > Ilya.Ganelin@capitalone.com>
>> >>> > wrote:
>> >>> >
>> >>> > > Hello, all ­ Google has just published a new blog announcing
the
>> >>>first
>> >>> > > complete integration of an open source project (Apache Flink)
with
>> >>> Apache
>> >>> > > Beam:
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> >
>> >>>
>> >>>
>> https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-
>> >>>p
>> >>>erspective
>> >>> > >
>> >>> > > http://data-artisans.com/why-apache-beam/
>> >>> > >
>> >>> > > Apache Beam is a unifying framework that allows users to leverage
>> >>> > > disparate streaming computational frameworks such as Storm,
>> >>>DataFlow,
>> >>> > > Flink, or Spark using a single API. This integration demands
that
>> >>>the
>> >>> > > framework conform to the Beam programming model:
>> >>> > > http://vldb.org/pvldb/vol8/p1792-Akidau.pdf, and provides
the
>> >>> > appropriate
>> >>> > > APIs.
>> >>> > >
>> >>> > > While cumbersome, the benefit of integrating with Beam is
>> >>>tremendous.
>> >>> > > Since there is no single framework that solves all streaming
>> >>>problems
>> >>> for
>> >>> > > all use cases, the ability to combine frameworks at-will makes
>> >>> developing
>> >>> > > end-state applications much more straightforward. I believe
that
>> >>>many
>> >>> > > projects will choose to leverage Apache Beam to take advantage
of
>> >>>this
>> >>> > and
>> >>> > > if Apex does not provide support Beam, it will fall behind,
>> >>>replaced by
>> >>> > > those frameworks that fit the easy-to-use model of Beam.
>> >>> > >
>> >>> > > If we become early adopters, we have a unique opportunity
to become
>> >>> part
>> >>> > > of what will quite possible become a very large community
of users
>> >>>and
>> >>> to
>> >>> > > capitalize on the inherent name recognition of Google to elevate
>> >>>the
>> >>> Apex
>> >>> > > project and expose it to many who would otherwise not be aware
of
>> >>>it.
>> >>> > >
>> >>> > > I think integration with Beam can pair with the recent work
on
>> >>> developing
>> >>> > > a high-level API for Apex and is a natural evolution towards
making
>> >>> Apex
>> >>> > > more accessible and more usable by a broader technical community.
>> >>> > >
>> >>> > > If there is compelling interest around making this effort
a
>> >>>reality, I
>> >>> > > would love to get this conversation started and work on translating
>> >>> this
>> >>> > > into a concrete plan of action.
>> >>> > >
>> >>> > >
>> >>> > > ________________________________________________________
>> >>> > >
>> >>> > > The information contained in this e-mail is confidential and/or
>> >>> > > proprietary to Capital One and/or its affiliates and may only
be
>> >>>used
>> >>> > > solely in performance of work or services for Capital One.
The
>> >>> > information
>> >>> > > transmitted herewith is intended only for use by the individual
or
>> >>> entity
>> >>> > > to which it is addressed. If the reader of this message is
not the
>> >>> > intended
>> >>> > > recipient, you are hereby notified that any review, retransmission,
>> >>> > > dissemination, distribution, copying or other use of, or taking
of
>> >>>any
>> >>> > > action in reliance upon this information is strictly prohibited.
If
>> >>>you
>> >>> > > have received this communication in error, please contact
the
>> >>>sender
>> >>> and
>> >>> > > delete the material from your computer.
>> >>> > >
>> >>> >
>> >>>
>> >
>> >________________________________________________________
>> >
>> >The information contained in this e-mail is confidential and/or
>> >proprietary to Capital One and/or its affiliates and may only be used
>> >solely in performance of work or services for Capital One. The
>> >information transmitted herewith is intended only for use by the
>> >individual or entity to which it is addressed. If the reader of this
>> >message is not the intended recipient, you are hereby notified that any
>> >review, retransmission, dissemination, distribution, copying or other use
>> >of, or taking of any action in reliance upon this information is strictly
>> >prohibited. If you have received this communication in error, please
>> >contact the sender and delete the material from your computer.
>> >
>>
>> ________________________________________________________
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>>
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One
and/or its affiliates and may only be used solely in performance of work or services for Capital
One. The information transmitted herewith is intended only for use by the individual or entity
to which it is addressed. If the reader of this message is not the intended recipient, you
are hereby notified that any review, retransmission, dissemination, distribution, copying
or other use of, or taking of any action in reliance upon this information is strictly prohibited.
If you have received this communication in error, please contact the sender and delete the
material from your computer.
Mime
View raw message