incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: [DISCUSS] Apache Dataflow Incubator Proposal
Date Thu, 21 Jan 2016 11:42:39 GMT
Perfect: done, you are on the proposal.

Thanks !
Regards
JB

On 01/21/2016 11:55 AM, chatz wrote:
> Charitha Elvitigala
>
> On 21 January 2016 at 16:17, Jean-Baptiste Onofré <jb@nanthrax.net> wrote:
>
>> Hi Chatz,
>>
>> sure, what name should I use on the proposal, Charitha ?
>>
>> Regards
>> JB
>>
>>
>> On 01/21/2016 11:32 AM, chatz wrote:
>>
>>> Hi Jean,
>>>
>>> I’d be interested in contributing as well.
>>>
>>> Thanks,
>>>
>>> Chatz
>>>
>>>
>>> On 21 January 2016 at 14:22, Jean-Baptiste Onofré <jb@nanthrax.net>
>>> wrote:
>>>
>>> Sweet: you are on the proposal ;)
>>>>
>>>> Thanks !
>>>> Regards
>>>> JB
>>>>
>>>>
>>>> On 01/21/2016 08:55 AM, Byung-Gon Chun wrote:
>>>>
>>>> This looks very interesting. I'm interested in contributing.
>>>>>
>>>>> Thanks.
>>>>> -Gon
>>>>>
>>>>> ---
>>>>> Byung-Gon Chun
>>>>>
>>>>>
>>>>> On Thu, Jan 21, 2016 at 1:32 AM, James Malone <
>>>>> jamesmalone@google.com.invalid> wrote:
>>>>>
>>>>> Hello everyone,
>>>>>
>>>>>>
>>>>>> Attached to this message is a proposed new project - Apache Dataflow,
a
>>>>>> unified programming model for data processing and integration.
>>>>>>
>>>>>> The text of the proposal is included below. Additionally, the proposal
>>>>>> is
>>>>>> in draft form on the wiki where we will make any required changes:
>>>>>>
>>>>>> https://wiki.apache.org/incubator/DataflowProposal
>>>>>>
>>>>>> We look forward to your feedback and input.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> James
>>>>>>
>>>>>> ----
>>>>>>
>>>>>> = Apache Dataflow =
>>>>>>
>>>>>> == Abstract ==
>>>>>>
>>>>>> Dataflow is an open source, unified model and set of language-specific
>>>>>> SDKs
>>>>>> for defining and executing data processing workflows, and also data
>>>>>> ingestion and integration flows, supporting Enterprise Integration
>>>>>> Patterns
>>>>>> (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines
>>>>>> simplify
>>>>>> the mechanics of large-scale batch and streaming data processing
and
>>>>>> can
>>>>>> run on a number of runtimes like Apache Flink, Apache Spark, and
Google
>>>>>> Cloud Dataflow (a cloud service). Dataflow also brings DSL in different
>>>>>> languages, allowing users to easily implement their data integration
>>>>>> processes.
>>>>>>
>>>>>> == Proposal ==
>>>>>>
>>>>>> Dataflow is a simple, flexible, and powerful system for distributed
>>>>>> data
>>>>>> processing at any scale. Dataflow provides a unified programming
>>>>>> model, a
>>>>>> software development kit to define and construct data processing
>>>>>> pipelines,
>>>>>> and runners to execute Dataflow pipelines in several runtime engines,
>>>>>> like
>>>>>> Apache Spark, Apache Flink, or Google Cloud Dataflow. Dataflow can
be
>>>>>> used
>>>>>> for a variety of streaming or batch data processing goals including
>>>>>> ETL,
>>>>>> stream analysis, and aggregate computation. The underlying programming
>>>>>> model for Dataflow provides MapReduce-like parallelism, combined
with
>>>>>> support for powerful data windowing, and fine-grained correctness
>>>>>> control.
>>>>>>
>>>>>> == Background ==
>>>>>>
>>>>>> Dataflow started as a set of Google projects focused on making data
>>>>>> processing easier, faster, and less costly. The Dataflow model is
a
>>>>>> successor to MapReduce, FlumeJava, and Millwheel inside Google and
is
>>>>>> focused on providing a unified solution for batch and stream
>>>>>> processing.
>>>>>> These projects on which Dataflow is based have been published in
>>>>>> several
>>>>>> papers made available to the public:
>>>>>>
>>>>>> * MapReduce - http://research.google.com/archive/mapreduce.html
>>>>>>
>>>>>> * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>>>>>>
>>>>>> * FlumeJava - http://notes.stephenholiday.com/FlumeJava.pdf
>>>>>>
>>>>>> * MillWheel - http://research.google.com/pubs/pub41378.html
>>>>>>
>>>>>> Dataflow was designed from the start to provide a portable programming
>>>>>> layer. When you define a data processing pipeline with the Dataflow
>>>>>> model,
>>>>>> you are creating a job which is capable of being processed by any
>>>>>> number
>>>>>> of
>>>>>> Dataflow processing engines. Several engines have been developed
to run
>>>>>> Dataflow pipelines in other open source runtimes, including a Dataflow
>>>>>> runner for Apache Flink and Apache Spark. There is also a “direct
>>>>>> runner”,
>>>>>> for execution on the developer machine (mainly for dev/debug purposes).
>>>>>> Another runner allows a Dataflow program to run on a managed service,
>>>>>> Google Cloud Dataflow, in Google Cloud Platform. The Dataflow Java
SDK
>>>>>> is
>>>>>> already available on GitHub, and independent from the Google Cloud
>>>>>> Dataflow
>>>>>> service. Another Python SDK is currently in active development.
>>>>>>
>>>>>> In this proposal, the Dataflow SDKs, model, and a set of runners
will
>>>>>> be
>>>>>> submitted as an OSS project under the ASF. The runners which are
a part
>>>>>> of
>>>>>> this proposal include those for Spark (from Cloudera), Flink (from
data
>>>>>> Artisans), and local development (from Google); the Google Cloud
>>>>>> Dataflow
>>>>>> service runner is not included in this proposal. Further references
to
>>>>>> Dataflow will refer to the Dataflow model, SDKs, and runners which
are
>>>>>> a
>>>>>> part of this proposal (Apache Dataflow) only. The initial submission
>>>>>> will
>>>>>> contain the already-released Java SDK; Google intends to submit the
>>>>>> Python
>>>>>> SDK later in the incubation process. The Google Cloud Dataflow service
>>>>>> will
>>>>>> continue to be one of many runners for Dataflow, built on Google
Cloud
>>>>>> Platform, to run Dataflow pipelines. Necessarily, Cloud Dataflow
will
>>>>>> develop against the Apache project additions, updates, and changes.
>>>>>> Google
>>>>>> Cloud Dataflow will become one user of Apache Dataflow and will
>>>>>> participate
>>>>>> in the project openly and publicly.
>>>>>>
>>>>>> The Dataflow programming model has been designed with simplicity,
>>>>>> scalability, and speed as key tenants. In the Dataflow model, you
only
>>>>>> need
>>>>>> to think about four top-level concepts when constructing your data
>>>>>> processing job:
>>>>>>
>>>>>> * Pipelines - The data processing job made of a series of computations
>>>>>> including input, processing, and output
>>>>>>
>>>>>> * PCollections - Bounded (or unbounded) datasets which represent
the
>>>>>> input,
>>>>>> intermediate and output data in pipelines
>>>>>>
>>>>>> * PTransforms - A data processing step in a pipeline in which one
or
>>>>>> more
>>>>>> PCollections are an input and output
>>>>>>
>>>>>> * I/O Sources and Sinks - APIs for reading and writing data which
are
>>>>>> the
>>>>>> roots and endpoints of the pipeline
>>>>>>
>>>>>> == Rationale ==
>>>>>>
>>>>>> With Dataflow, Google intended to develop a framework which allowed
>>>>>> developers to be maximally productive in defining the processing,
and
>>>>>> then
>>>>>> be able to execute the program at various levels of
>>>>>> latency/cost/completeness without re-architecting or re-writing it.
>>>>>> This
>>>>>> goal was informed by Google’s past experience  developing several
>>>>>> models,
>>>>>> frameworks, and tools useful for large-scale and distributed data
>>>>>> processing. While Google has previously published papers describing
>>>>>> some
>>>>>> of
>>>>>> its technologies, Google decided to take a different approach with
>>>>>> Dataflow. Google open-sourced the SDK and model alongside
>>>>>> commercialization
>>>>>> of the idea and ahead of publishing papers on the topic. As a result,
a
>>>>>> number of open source runtimes exist for Dataflow, such as the Apache
>>>>>> Flink
>>>>>> and Apache Spark runners.
>>>>>>
>>>>>> We believe that submitting Dataflow as an Apache project will provide
>>>>>> an
>>>>>> immediate, worthwhile, and substantial contribution to the open source
>>>>>> community. As an incubating project, we believe Dataflow will have
a
>>>>>> better
>>>>>> opportunity to provide a meaningful contribution to OSS and also
>>>>>> integrate
>>>>>> with other Apache projects.
>>>>>>
>>>>>> In the long term, we believe Dataflow can be a powerful abstraction
>>>>>> layer
>>>>>> for data processing. By providing an abstraction layer for data
>>>>>> pipelines
>>>>>> and processing, data workflows can be increasingly portable, resilient
>>>>>> to
>>>>>> breaking changes in tooling, and compatible across many execution
>>>>>> engines,
>>>>>> runtimes, and open source projects.
>>>>>>
>>>>>> == Initial Goals ==
>>>>>>
>>>>>> We are breaking our initial goals into immediate (< 2 months),
>>>>>> short-term
>>>>>> (2-4 months), and intermediate-term (> 4 months).
>>>>>>
>>>>>> Our immediate goals include the following:
>>>>>>
>>>>>> * Plan for reconciling the Dataflow Java SDK and various runners
into
>>>>>> one
>>>>>> project
>>>>>>
>>>>>> * Plan for refactoring the existing Java SDK for better extensibility
>>>>>> by
>>>>>> SDK and runner writers
>>>>>>
>>>>>> * Validating all dependencies are ASL 2.0 or compatible
>>>>>>
>>>>>> * Understanding and adapting to the Apache development process
>>>>>>
>>>>>> Our short-term goals include:
>>>>>>
>>>>>> * Moving the newly-merged lists, and build utilities to Apache
>>>>>>
>>>>>> * Start refactoring codebase and move code to Apache Git repo
>>>>>>
>>>>>> * Continue development of new features, functions, and fixes in the
>>>>>> Dataflow Java SDK, and Dataflow runners
>>>>>>
>>>>>> * Cleaning up the Dataflow SDK sources and crafting a roadmap and
plan
>>>>>> for
>>>>>> how to include new major ideas, modules, and runtimes
>>>>>>
>>>>>> * Establishment of easy and clear build/test framework for Dataflow
and
>>>>>> associated runtimes; creation of testing, rollback, and validation
>>>>>> policy
>>>>>>
>>>>>> * Analysis and design for work needed to make Dataflow a better data
>>>>>> processing abstraction layer for multiple open source frameworks
and
>>>>>> environments
>>>>>>
>>>>>> Finally, we have a number of intermediate-term goals:
>>>>>>
>>>>>> * Roadmapping, planning, and execution of integrations with other
OSS
>>>>>> and
>>>>>> non-OSS projects/products
>>>>>>
>>>>>> * Inclusion of additional SDK for Python, which is under active
>>>>>> development
>>>>>>
>>>>>> == Current Status ==
>>>>>>
>>>>>> === Meritocracy ===
>>>>>>
>>>>>> Dataflow was initially developed based on ideas from many employees
>>>>>> within
>>>>>> Google. As an ASL OSS project on GitHub, the Dataflow SDK has received
>>>>>> contributions from data Artisans, Cloudera Labs, and other individual
>>>>>> developers. As a project under incubation, we are committed to
>>>>>> expanding
>>>>>> our effort to build an environment which supports a meritocracy.
We are
>>>>>> focused on engaging the community and other related projects for
>>>>>> support
>>>>>> and contributions. Moreover, we are committed to ensure contributors
>>>>>> and
>>>>>> committers to Dataflow come from a broad mix of organizations through
a
>>>>>> merit-based decision process during incubation. We believe strongly
in
>>>>>> the
>>>>>> Dataflow model and are committed to growing an inclusive community
of
>>>>>> Dataflow contributors.
>>>>>>
>>>>>> === Community ===
>>>>>>
>>>>>> The core of the Dataflow Java SDK has been developed by Google for
use
>>>>>> with
>>>>>> Google Cloud Dataflow. Google has active community engagement in
the
>>>>>> SDK
>>>>>> GitHub repository (
>>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK
>>>>>> ),
>>>>>> on Stack Overflow (
>>>>>> http://stackoverflow.com/questions/tagged/google-cloud-dataflow)
and
>>>>>> has
>>>>>> had contributions from a number of organizations and indivuduals.
>>>>>>
>>>>>> Everyday, Cloud Dataflow is actively used by a number of organizations
>>>>>> and
>>>>>> institutions for batch and stream processing of data. We believe
>>>>>> acceptance
>>>>>> will allow us to consolidate existing Dataflow-related work, grow
the
>>>>>> Dataflow community, and deepen connections between Dataflow and other
>>>>>> open
>>>>>> source projects.
>>>>>>
>>>>>> === Core Developers ===
>>>>>>
>>>>>> The core developers for Dataflow and the Dataflow runners are:
>>>>>>
>>>>>> * Frances Perry
>>>>>>
>>>>>> * Tyler Akidau
>>>>>>
>>>>>> * Davor Bonaci
>>>>>>
>>>>>> * Luke Cwik
>>>>>>
>>>>>> * Ben Chambers
>>>>>>
>>>>>> * Kenn Knowles
>>>>>>
>>>>>> * Dan Halperin
>>>>>>
>>>>>> * Daniel Mills
>>>>>>
>>>>>> * Mark Shields
>>>>>>
>>>>>> * Craig Chambers
>>>>>>
>>>>>> * Maximilian Michels
>>>>>>
>>>>>> * Tom White
>>>>>>
>>>>>> * Josh Wills
>>>>>>
>>>>>> === Alignment ===
>>>>>>
>>>>>> The Dataflow SDK can be used to create Dataflow pipelines which can
be
>>>>>> executed on Apache Spark or Apache Flink. Dataflow is also related
to
>>>>>> other
>>>>>> Apache projects, such as Apache Crunch. We plan on expanding
>>>>>> functionality
>>>>>> for Dataflow runners, support for additional domain specific languages,
>>>>>> and
>>>>>> increased portability so Dataflow is a powerful abstraction layer
for
>>>>>> data
>>>>>> processing.
>>>>>>
>>>>>> == Known Risks ==
>>>>>>
>>>>>> === Orphaned Products ===
>>>>>>
>>>>>> The Dataflow SDK is presently used by several organizations, from
small
>>>>>> startups to Fortune 100 companies, to construct production pipelines
>>>>>> which
>>>>>> are executed in Google Cloud Dataflow. Google has a long-term
>>>>>> commitment
>>>>>> to
>>>>>> advance the Dataflow SDK; moreover, Dataflow is seeing increasing
>>>>>> interest,
>>>>>> development, and adoption from organizations outside of Google.
>>>>>>
>>>>>> === Inexperience with Open Source ===
>>>>>>
>>>>>> Google believes strongly in open source and the exchange of information
>>>>>> to
>>>>>> advance new ideas and work. Examples of this commitment are active
OSS
>>>>>> projects such as Chromium (https://www.chromium.org) and Kubernetes
(
>>>>>> http://kubernetes.io/). With Dataflow, we have tried to be
>>>>>> increasingly
>>>>>> open and forward-looking; we have published a paper in the VLDB
>>>>>> conference
>>>>>> describing the Dataflow model (
>>>>>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf) and were quick to
>>>>>> release
>>>>>> the Dataflow SDK as open source software with the launch of Cloud
>>>>>> Dataflow.
>>>>>> Our submission to the Apache Software Foundation is a logical extension
>>>>>> of
>>>>>> our commitment to open source software.
>>>>>>
>>>>>> === Homogeneous Developers ===
>>>>>>
>>>>>> The majority of committers in this proposal belong to Google due
to the
>>>>>> fact that Dataflow has emerged from several internal Google projects.
>>>>>> This
>>>>>> proposal also includes committers outside of Google who are actively
>>>>>> involved with other Apache projects, such as Hadoop, Flink, and Spark.
>>>>>> We
>>>>>> expect our entry into incubation will allow us to expand the number
of
>>>>>> individuals and organizations participating in Dataflow development.
>>>>>> Additionally, separation of the Dataflow SDK from Google Cloud Dataflow
>>>>>> allows us to focus on the open source SDK and model and do what is
best
>>>>>> for
>>>>>> this project.
>>>>>>
>>>>>> === Reliance on Salaried Developers ===
>>>>>>
>>>>>> The Dataflow SDK and Dataflow runners have been developed primarily
by
>>>>>> salaried developers supporting the Google Cloud Dataflow project.
While
>>>>>> the
>>>>>> Dataflow SDK and Cloud Dataflow have been developed by different
teams
>>>>>> (and
>>>>>> this proposal would reinforce that separation) we expect our initial
>>>>>> set
>>>>>> of
>>>>>> developers will still primarily be salaried. Contribution has not
been
>>>>>> exclusively from salaried developers, however. For example, the contrib
>>>>>> directory of the Dataflow SDK (
>>>>>>
>>>>>>
>>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/master/contrib
>>>>>> )
>>>>>> contains items from free-time contributors. Moreover, seperate
>>>>>> projects,
>>>>>> such as ScalaFlow (https://github.com/darkjh/scalaflow) have been
>>>>>> created
>>>>>> around the Dataflow model and SDK. We expect our reliance on salaried
>>>>>> developers will decrease over time during incubation.
>>>>>>
>>>>>> === Relationship with other Apache products ===
>>>>>>
>>>>>> Dataflow directly interoperates with or utilizes several existing
>>>>>> Apache
>>>>>> projects.
>>>>>>
>>>>>> * Build
>>>>>>
>>>>>> ** Apache Maven
>>>>>>
>>>>>> * Data I/O, Libraries
>>>>>>
>>>>>> ** Apache Avro
>>>>>>
>>>>>> ** Apache Commons
>>>>>>
>>>>>> * Dataflow runners
>>>>>>
>>>>>> ** Apache Flink
>>>>>>
>>>>>> ** Apache Spark
>>>>>>
>>>>>> Dataflow when used in batch mode shares similarities with Apache
>>>>>> Crunch;
>>>>>> however, Dataflow is focused on a model, SDK, and abstraction layer
>>>>>> beyond
>>>>>> Spark and Hadoop (MapReduce.) One key goal of Dataflow is to provide
an
>>>>>> intermediate abstraction layer which can easily be implemented and
>>>>>> utilized
>>>>>> across several different processing frameworks.
>>>>>>
>>>>>> === An excessive fascination with the Apache brand ===
>>>>>>
>>>>>> With this proposal we are not seeking attention or publicity. Rather,
>>>>>> we
>>>>>> firmly believe in the Dataflow model, SDK, and the ability to make
>>>>>> Dataflow
>>>>>> a powerful yet simple framework for data processing. While the Dataflow
>>>>>> SDK
>>>>>> and model have been open source, we believe putting code on GitHub
can
>>>>>> only
>>>>>> go so far. We see the Apache community, processes, and mission as
>>>>>> critical
>>>>>> for ensuring the Dataflow SDK and model are truly community-driven,
>>>>>> positively impactful, and innovative open source software. While
Google
>>>>>> has
>>>>>> taken a number of steps to advance its various open source projects,
we
>>>>>> believe Dataflow is a great fit for the Apache Software Foundation
due
>>>>>> to
>>>>>> its focus on data processing and its relationships to existing ASF
>>>>>> projects.
>>>>>>
>>>>>> == Documentation ==
>>>>>>
>>>>>> The following documentation is relevant to this proposal. Relevant
>>>>>> portion
>>>>>> of the documentation will be contributed to the Apache Dataflow
>>>>>> project.
>>>>>>
>>>>>> * Dataflow website: https://cloud.google.com/dataflow
>>>>>>
>>>>>> * Dataflow programming model:
>>>>>> https://cloud.google.com/dataflow/model/programming-model
>>>>>>
>>>>>> * Codebases
>>>>>>
>>>>>> ** Dataflow Java SDK:
>>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK
>>>>>>
>>>>>> ** Flink Dataflow runner:
>>>>>> https://github.com/dataArtisans/flink-dataflow
>>>>>>
>>>>>> ** Spark Dataflow runner: https://github.com/cloudera/spark-dataflow
>>>>>>
>>>>>> * Dataflow Java SDK issue tracker:
>>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues
>>>>>>
>>>>>> * google-cloud-dataflow tag on Stack Overflow:
>>>>>> http://stackoverflow.com/questions/tagged/google-cloud-dataflow
>>>>>>
>>>>>> == Initial Source ==
>>>>>>
>>>>>> The initial source for Dataflow which we will submit to the Apache
>>>>>> Foundation will include several related projects which are currently
>>>>>> hosted
>>>>>> on the GitHub repositories:
>>>>>>
>>>>>> * Dataflow Java SDK (
>>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK)
>>>>>>
>>>>>> * Flink Dataflow runner (
>>>>>> https://github.com/dataArtisans/flink-dataflow)
>>>>>>
>>>>>> * Spark Dataflow runner (https://github.com/cloudera/spark-dataflow)
>>>>>>
>>>>>> These projects have always been Apache 2.0 licensed. We intend to
>>>>>> bundle
>>>>>> all of these repositories since they are all complimentary and should
>>>>>> be
>>>>>> maintained in one project. Prior to our submission, we will combine
all
>>>>>> of
>>>>>> these projects into a new git repository.
>>>>>>
>>>>>> == Source and Intellectual Property Submission Plan ==
>>>>>>
>>>>>> The source for the Dataflow SDK and the three runners (Spark, Flink,
>>>>>> Google
>>>>>> Cloud Dataflow) are already licensed under an Apache 2 license.
>>>>>>
>>>>>> * Dataflow SDK -
>>>>>>
>>>>>>
>>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/LICENSE
>>>>>>
>>>>>> * Flink runner -
>>>>>> https://github.com/dataArtisans/flink-dataflow/blob/master/LICENSE
>>>>>>
>>>>>> * Spark runner -
>>>>>> https://github.com/cloudera/spark-dataflow/blob/master/LICENSE
>>>>>>
>>>>>> Contributors to the Dataflow SDK have also signed the Google Individual
>>>>>> Contributor License Agreement (
>>>>>> https://cla.developers.google.com/about/google-individual) in order
to
>>>>>> contribute to the project.
>>>>>>
>>>>>> With respect to trademark rights, Google does not hold a trademark
on
>>>>>> the
>>>>>> phrase “Dataflow.” Based on feedback and guidance we receive
during the
>>>>>> incubation process, we are open to renaming the project if necessary
>>>>>> for
>>>>>> trademark or other concerns.
>>>>>>
>>>>>> == External Dependencies ==
>>>>>>
>>>>>> All external dependencies are licensed under an Apache 2.0 or
>>>>>> Apache-compatible license. As we grow the Dataflow community we will
>>>>>> configure our build process to require and validate all contributions
>>>>>> and
>>>>>> dependencies are licensed under the Apache 2.0 license or are under
an
>>>>>> Apache-compatible license.
>>>>>>
>>>>>> == Required Resources ==
>>>>>>
>>>>>> === Mailing Lists ===
>>>>>>
>>>>>> We currently use a mix of mailing lists. We will migrate our existing
>>>>>> mailing lists to the following:
>>>>>>
>>>>>> * dev@dataflow.incubator.apache.org
>>>>>>
>>>>>> * user@dataflow.incubator.apache.org
>>>>>>
>>>>>> * private@dataflow.incubator.apache.org
>>>>>>
>>>>>> * commits@dataflow.incubator.apache.org
>>>>>>
>>>>>> === Source Control ===
>>>>>>
>>>>>> The Dataflow team currently uses Git and would like to continue to
do
>>>>>> so.
>>>>>> We request a Git repository for Dataflow with mirroring to GitHub
>>>>>> enabled.
>>>>>>
>>>>>> === Issue Tracking ===
>>>>>>
>>>>>> We request the creation of an Apache-hosted JIRA. The Dataflow project
>>>>>> is
>>>>>> currently using both a public GitHub issue tracker and internal Google
>>>>>> issue tracking. We will migrate and combine from these two sources
to
>>>>>> the
>>>>>> Apache JIRA.
>>>>>>
>>>>>> == Initial Committers ==
>>>>>>
>>>>>> * Aljoscha Krettek     [aljoscha@apache.org]
>>>>>>
>>>>>> * Amit Sela            [amitsela33@gmail.com]
>>>>>>
>>>>>> * Ben Chambers         [bchambers@google.com]
>>>>>>
>>>>>> * Craig Chambers       [chambers@google.com]
>>>>>>
>>>>>> * Dan Halperin         [dhalperi@google.com]
>>>>>>
>>>>>> * Davor Bonaci         [davor@google.com]
>>>>>>
>>>>>> * Frances Perry        [fjp@google.com]
>>>>>>
>>>>>> * James Malone         [jamesmalone@google.com]
>>>>>>
>>>>>> * Jean-Baptiste Onofré [jbonofre@apache.org]
>>>>>>
>>>>>> * Josh Wills           [jwills@apache.org]
>>>>>>
>>>>>> * Kostas Tzoumas       [kostas@data-artisans.com]
>>>>>>
>>>>>> * Kenneth Knowles      [klk@google.com]
>>>>>>
>>>>>> * Luke Cwik            [lcwik@google.com]
>>>>>>
>>>>>> * Maximilian Michels   [mxm@apache.org]
>>>>>>
>>>>>> * Stephan Ewen         [stephan@data-artisans.com]
>>>>>>
>>>>>> * Tom White            [tom@cloudera.com]
>>>>>>
>>>>>> * Tyler Akidau         [takidau@google.com]
>>>>>>
>>>>>> == Affiliations ==
>>>>>>
>>>>>> The initial committers are from six organizations. Google developed
>>>>>> Dataflow and the Dataflow SDK, data Artisans developed the Flink
>>>>>> runner,
>>>>>> and Cloudera (Labs) developed the Spark runner.
>>>>>>
>>>>>> * Cloudera
>>>>>>
>>>>>> ** Tom White
>>>>>>
>>>>>> * Data Artisans
>>>>>>
>>>>>> ** Aljoscha Krettek
>>>>>>
>>>>>> ** Kostas Tzoumas
>>>>>>
>>>>>> ** Maximilian Michels
>>>>>>
>>>>>> ** Stephan Ewen
>>>>>>
>>>>>> * Google
>>>>>>
>>>>>> ** Ben Chambers
>>>>>>
>>>>>> ** Dan Halperin
>>>>>>
>>>>>> ** Davor Bonaci
>>>>>>
>>>>>> ** Frances Perry
>>>>>>
>>>>>> ** James Malone
>>>>>>
>>>>>> ** Kenneth Knowles
>>>>>>
>>>>>> ** Luke Cwik
>>>>>>
>>>>>> ** Tyler Akidau
>>>>>>
>>>>>> * PayPal
>>>>>>
>>>>>> ** Amit Sela
>>>>>>
>>>>>> * Slack
>>>>>>
>>>>>> ** Josh Wills
>>>>>>
>>>>>> * Talend
>>>>>>
>>>>>> ** Jean-Baptiste Onofré
>>>>>>
>>>>>> == Sponsors ==
>>>>>>
>>>>>> === Champion ===
>>>>>>
>>>>>> * Jean-Baptiste Onofre      [jbonofre@apache.org]
>>>>>>
>>>>>> === Nominated Mentors ===
>>>>>>
>>>>>> * Jim Jagielski           [jim@apache.org]
>>>>>>
>>>>>> * Venkatesh Seetharam     [venkatesh@apache.org]
>>>>>>
>>>>>> * Bertrand Delacretaz     [bdelacretaz@apache.org]
>>>>>>
>>>>>> * Ted Dunning             [tdunning@apache.org]
>>>>>>
>>>>>> === Sponsoring Entity ===
>>>>>>
>>>>>> The Apache Incubator
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>> Jean-Baptiste Onofré
>>>> jbonofre@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>>>
>>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message