incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: [DISCUSS] Apache Dataflow Incubator Proposal
Date Fri, 22 Jan 2016 10:08:52 GMT
Hi Mayank,

sure: you are in.

Thanks !
Regards
JB

On 01/22/2016 12:29 AM, Mayank Bansal wrote:
> Hi Jean,
>
> Nice Proposal.
>
> I wanted to contribute to this project. Can you please add me too?
>
> Thanks a lot for the help
>
> Thanks,
> Mayank
>
> On Thu, Jan 21, 2016 at 8:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
> <mailto:jb@nanthrax.net>> wrote:
>
>     Hey Alex,
>
>     awesome: I added you on the proposal.
>
>     Thanks,
>     Regards
>     JB
>
>
>     On 01/21/2016 05:03 PM, Alexander Bezzubov wrote:
>
>         Hi,
>
>         it's great to see DataFlow becoming part to Apache ecosystem,
>         thank you
>         bringing it in.
>         I would be happy to get involved and help.
>
>         --
>         Alex
>
>         On Thu, Jan 21, 2016 at 8:42 PM, Jean-Baptiste Onofré
>         <jb@nanthrax.net <mailto:jb@nanthrax.net>>
>         wrote:
>
>             Perfect: done, you are on the proposal.
>
>             Thanks !
>             Regards
>             JB
>
>
>             On 01/21/2016 11:55 AM, chatz wrote:
>
>                 Charitha Elvitigala
>
>                 On 21 January 2016 at 16:17, Jean-Baptiste Onofré
>                 <jb@nanthrax.net <mailto:jb@nanthrax.net>>
>                 wrote:
>
>                 Hi Chatz,
>
>
>                     sure, what name should I use on the proposal, Charitha ?
>
>                     Regards
>                     JB
>
>
>                     On 01/21/2016 11:32 AM, chatz wrote:
>
>                     Hi Jean,
>
>
>                         I’d be interested in contributing as well.
>
>                         Thanks,
>
>                         Chatz
>
>
>                         On 21 January 2016 at 14:22, Jean-Baptiste
>                         Onofré <jb@nanthrax.net <mailto:jb@nanthrax.net>>
>                         wrote:
>
>                         Sweet: you are on the proposal ;)
>
>
>                             Thanks !
>                             Regards
>                             JB
>
>
>                             On 01/21/2016 08:55 AM, Byung-Gon Chun wrote:
>
>                             This looks very interesting. I'm interested
>                             in contributing.
>
>
>                                 Thanks.
>                                 -Gon
>
>                                 ---
>                                 Byung-Gon Chun
>
>
>                                 On Thu, Jan 21, 2016 at 1:32 AM, James
>                                 Malone <
>                                 jamesmalone@google.com.invalid> wrote:
>
>                                 Hello everyone,
>
>
>                                     Attached to this message is a
>                                     proposed new project - Apache
>                                     Dataflow, a
>                                     unified programming model for data
>                                     processing and integration.
>
>                                     The text of the proposal is included
>                                     below. Additionally, the
>                                     proposal
>                                     is
>                                     in draft form on the wiki where we
>                                     will make any required changes:
>
>                                     https://wiki.apache.org/incubator/DataflowProposal
>
>                                     We look forward to your feedback and
>                                     input.
>
>                                     Best,
>
>                                     James
>
>                                     ----
>
>                                     = Apache Dataflow =
>
>                                     == Abstract ==
>
>                                     Dataflow is an open source, unified
>                                     model and set of
>                                     language-specific
>                                     SDKs
>                                     for defining and executing data
>                                     processing workflows, and also data
>                                     ingestion and integration flows,
>                                     supporting Enterprise Integration
>                                     Patterns
>                                     (EIPs) and Domain Specific Languages
>                                     (DSLs). Dataflow pipelines
>                                     simplify
>                                     the mechanics of large-scale batch
>                                     and streaming data processing and
>                                     can
>                                     run on a number of runtimes like
>                                     Apache Flink, Apache Spark, and
>                                     Google
>                                     Cloud Dataflow (a cloud service).
>                                     Dataflow also brings DSL in
>                                     different
>                                     languages, allowing users to easily
>                                     implement their data integration
>                                     processes.
>
>                                     == Proposal ==
>
>                                     Dataflow is a simple, flexible, and
>                                     powerful system for distributed
>                                     data
>                                     processing at any scale. Dataflow
>                                     provides a unified programming
>                                     model, a
>                                     software development kit to define
>                                     and construct data processing
>                                     pipelines,
>                                     and runners to execute Dataflow
>                                     pipelines in several runtime engines,
>                                     like
>                                     Apache Spark, Apache Flink, or
>                                     Google Cloud Dataflow. Dataflow can be
>                                     used
>                                     for a variety of streaming or batch
>                                     data processing goals including
>                                     ETL,
>                                     stream analysis, and aggregate
>                                     computation. The underlying
>                                     programming
>                                     model for Dataflow provides
>                                     MapReduce-like parallelism, combined
>                                     with
>                                     support for powerful data windowing,
>                                     and fine-grained correctness
>                                     control.
>
>                                     == Background ==
>
>                                     Dataflow started as a set of Google
>                                     projects focused on making data
>                                     processing easier, faster, and less
>                                     costly. The Dataflow model is a
>                                     successor to MapReduce, FlumeJava,
>                                     and Millwheel inside Google and is
>                                     focused on providing a unified
>                                     solution for batch and stream
>                                     processing.
>                                     These projects on which Dataflow is
>                                     based have been published in
>                                     several
>                                     papers made available to the public:
>
>                                     * MapReduce -
>                                     http://research.google.com/archive/mapreduce.html
>
>                                     * Dataflow model  -
>                                     http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>
>                                     * FlumeJava -
>                                     http://notes.stephenholiday.com/FlumeJava.pdf
>
>                                     * MillWheel -
>                                     http://research.google.com/pubs/pub41378.html
>
>                                     Dataflow was designed from the start
>                                     to provide a portable
>                                     programming
>                                     layer. When you define a data
>                                     processing pipeline with the Dataflow
>                                     model,
>                                     you are creating a job which is
>                                     capable of being processed by any
>                                     number
>                                     of
>                                     Dataflow processing engines. Several
>                                     engines have been developed to
>                                     run
>                                     Dataflow pipelines in other open
>                                     source runtimes, including a
>                                     Dataflow
>                                     runner for Apache Flink and Apache
>                                     Spark. There is also a “direct
>                                     runner”,
>                                     for execution on the developer
>                                     machine (mainly for dev/debug
>                                     purposes).
>                                     Another runner allows a Dataflow
>                                     program to run on a managed service,
>                                     Google Cloud Dataflow, in Google
>                                     Cloud Platform. The Dataflow Java
>                                     SDK
>                                     is
>                                     already available on GitHub, and
>                                     independent from the Google Cloud
>                                     Dataflow
>                                     service. Another Python SDK is
>                                     currently in active development.
>
>                                     In this proposal, the Dataflow SDKs,
>                                     model, and a set of runners will
>                                     be
>                                     submitted as an OSS project under
>                                     the ASF. The runners which are a
>                                     part
>                                     of
>                                     this proposal include those for
>                                     Spark (from Cloudera), Flink (from
>                                     data
>                                     Artisans), and local development
>                                     (from Google); the Google Cloud
>                                     Dataflow
>                                     service runner is not included in
>                                     this proposal. Further references
>                                     to
>                                     Dataflow will refer to the Dataflow
>                                     model, SDKs, and runners which
>                                     are
>                                     a
>                                     part of this proposal (Apache
>                                     Dataflow) only. The initial submission
>                                     will
>                                     contain the already-released Java
>                                     SDK; Google intends to submit the
>                                     Python
>                                     SDK later in the incubation process.
>                                     The Google Cloud Dataflow
>                                     service
>                                     will
>                                     continue to be one of many runners
>                                     for Dataflow, built on Google
>                                     Cloud
>                                     Platform, to run Dataflow pipelines.
>                                     Necessarily, Cloud Dataflow will
>                                     develop against the Apache project
>                                     additions, updates, and changes.
>                                     Google
>                                     Cloud Dataflow will become one user
>                                     of Apache Dataflow and will
>                                     participate
>                                     in the project openly and publicly.
>
>                                     The Dataflow programming model has
>                                     been designed with simplicity,
>                                     scalability, and speed as key
>                                     tenants. In the Dataflow model, you
>                                     only
>                                     need
>                                     to think about four top-level
>                                     concepts when constructing your data
>                                     processing job:
>
>                                     * Pipelines - The data processing
>                                     job made of a series of
>                                     computations
>                                     including input, processing, and output
>
>                                     * PCollections - Bounded (or
>                                     unbounded) datasets which represent the
>                                     input,
>                                     intermediate and output data in
>                                     pipelines
>
>                                     * PTransforms - A data processing
>                                     step in a pipeline in which one or
>                                     more
>                                     PCollections are an input and output
>
>                                     * I/O Sources and Sinks - APIs for
>                                     reading and writing data which are
>                                     the
>                                     roots and endpoints of the pipeline
>
>                                     == Rationale ==
>
>                                     With Dataflow, Google intended to
>                                     develop a framework which allowed
>                                     developers to be maximally
>                                     productive in defining the
>                                     processing, and
>                                     then
>                                     be able to execute the program at
>                                     various levels of
>                                     latency/cost/completeness without
>                                     re-architecting or re-writing it.
>                                     This
>                                     goal was informed by Google’s past
>                                     experience  developing several
>                                     models,
>                                     frameworks, and tools useful for
>                                     large-scale and distributed data
>                                     processing. While Google has
>                                     previously published papers describing
>                                     some
>                                     of
>                                     its technologies, Google decided to
>                                     take a different approach with
>                                     Dataflow. Google open-sourced the
>                                     SDK and model alongside
>                                     commercialization
>                                     of the idea and ahead of publishing
>                                     papers on the topic. As a
>                                     result, a
>                                     number of open source runtimes exist
>                                     for Dataflow, such as the Apache
>                                     Flink
>                                     and Apache Spark runners.
>
>                                     We believe that submitting Dataflow
>                                     as an Apache project will provide
>                                     an
>                                     immediate, worthwhile, and
>                                     substantial contribution to the open
>                                     source
>                                     community. As an incubating project,
>                                     we believe Dataflow will have a
>                                     better
>                                     opportunity to provide a meaningful
>                                     contribution to OSS and also
>                                     integrate
>                                     with other Apache projects.
>
>                                     In the long term, we believe
>                                     Dataflow can be a powerful abstraction
>                                     layer
>                                     for data processing. By providing an
>                                     abstraction layer for data
>                                     pipelines
>                                     and processing, data workflows can
>                                     be increasingly portable,
>                                     resilient
>                                     to
>                                     breaking changes in tooling, and
>                                     compatible across many execution
>                                     engines,
>                                     runtimes, and open source projects.
>
>                                     == Initial Goals ==
>
>                                     We are breaking our initial goals
>                                     into immediate (< 2 months),
>                                     short-term
>                                     (2-4 months), and intermediate-term
>                                     (> 4 months).
>
>                                     Our immediate goals include the
>                                     following:
>
>                                     * Plan for reconciling the Dataflow
>                                     Java SDK and various runners into
>                                     one
>                                     project
>
>                                     * Plan for refactoring the existing
>                                     Java SDK for better extensibility
>                                     by
>                                     SDK and runner writers
>
>                                     * Validating all dependencies are
>                                     ASL 2.0 or compatible
>
>                                     * Understanding and adapting to the
>                                     Apache development process
>
>                                     Our short-term goals include:
>
>                                     * Moving the newly-merged lists, and
>                                     build utilities to Apache
>
>                                     * Start refactoring codebase and
>                                     move code to Apache Git repo
>
>                                     * Continue development of new
>                                     features, functions, and fixes in the
>                                     Dataflow Java SDK, and Dataflow runners
>
>                                     * Cleaning up the Dataflow SDK
>                                     sources and crafting a roadmap and
>                                     plan
>                                     for
>                                     how to include new major ideas,
>                                     modules, and runtimes
>
>                                     * Establishment of easy and clear
>                                     build/test framework for Dataflow
>                                     and
>                                     associated runtimes; creation of
>                                     testing, rollback, and validation
>                                     policy
>
>                                     * Analysis and design for work
>                                     needed to make Dataflow a better data
>                                     processing abstraction layer for
>                                     multiple open source frameworks and
>                                     environments
>
>                                     Finally, we have a number of
>                                     intermediate-term goals:
>
>                                     * Roadmapping, planning, and
>                                     execution of integrations with other OSS
>                                     and
>                                     non-OSS projects/products
>
>                                     * Inclusion of additional SDK for
>                                     Python, which is under active
>                                     development
>
>                                     == Current Status ==
>
>                                     === Meritocracy ===
>
>                                     Dataflow was initially developed
>                                     based on ideas from many employees
>                                     within
>                                     Google. As an ASL OSS project on
>                                     GitHub, the Dataflow SDK has
>                                     received
>                                     contributions from data Artisans,
>                                     Cloudera Labs, and other individual
>                                     developers. As a project under
>                                     incubation, we are committed to
>                                     expanding
>                                     our effort to build an environment
>                                     which supports a meritocracy. We
>                                     are
>                                     focused on engaging the community
>                                     and other related projects for
>                                     support
>                                     and contributions. Moreover, we are
>                                     committed to ensure contributors
>                                     and
>                                     committers to Dataflow come from a
>                                     broad mix of organizations
>                                     through a
>                                     merit-based decision process during
>                                     incubation. We believe strongly
>                                     in
>                                     the
>                                     Dataflow model and are committed to
>                                     growing an inclusive community of
>                                     Dataflow contributors.
>
>                                     === Community ===
>
>                                     The core of the Dataflow Java SDK
>                                     has been developed by Google for
>                                     use
>                                     with
>                                     Google Cloud Dataflow. Google has
>                                     active community engagement in the
>                                     SDK
>                                     GitHub repository (
>                                     https://github.com/GoogleCloudPlatform/DataflowJavaSDK
>                                     ),
>                                     on Stack Overflow (
>                                     http://stackoverflow.com/questions/tagged/google-cloud-dataflow)
>                                     and
>                                     has
>                                     had contributions from a number of
>                                     organizations and indivuduals.
>
>                                     Everyday, Cloud Dataflow is actively
>                                     used by a number of
>                                     organizations
>                                     and
>                                     institutions for batch and stream
>                                     processing of data. We believe
>                                     acceptance
>                                     will allow us to consolidate
>                                     existing Dataflow-related work, grow the
>                                     Dataflow community, and deepen
>                                     connections between Dataflow and other
>                                     open
>                                     source projects.
>
>                                     === Core Developers ===
>
>                                     The core developers for Dataflow and
>                                     the Dataflow runners are:
>
>                                     * Frances Perry
>
>                                     * Tyler Akidau
>
>                                     * Davor Bonaci
>
>                                     * Luke Cwik
>
>                                     * Ben Chambers
>
>                                     * Kenn Knowles
>
>                                     * Dan Halperin
>
>                                     * Daniel Mills
>
>                                     * Mark Shields
>
>                                     * Craig Chambers
>
>                                     * Maximilian Michels
>
>                                     * Tom White
>
>                                     * Josh Wills
>
>                                     === Alignment ===
>
>                                     The Dataflow SDK can be used to
>                                     create Dataflow pipelines which can
>                                     be
>                                     executed on Apache Spark or Apache
>                                     Flink. Dataflow is also related to
>                                     other
>                                     Apache projects, such as Apache
>                                     Crunch. We plan on expanding
>                                     functionality
>                                     for Dataflow runners, support for
>                                     additional domain specific
>                                     languages,
>                                     and
>                                     increased portability so Dataflow is
>                                     a powerful abstraction layer for
>                                     data
>                                     processing.
>
>                                     == Known Risks ==
>
>                                     === Orphaned Products ===
>
>                                     The Dataflow SDK is presently used
>                                     by several organizations, from
>                                     small
>                                     startups to Fortune 100 companies,
>                                     to construct production pipelines
>                                     which
>                                     are executed in Google Cloud
>                                     Dataflow. Google has a long-term
>                                     commitment
>                                     to
>                                     advance the Dataflow SDK; moreover,
>                                     Dataflow is seeing increasing
>                                     interest,
>                                     development, and adoption from
>                                     organizations outside of Google.
>
>                                     === Inexperience with Open Source ===
>
>                                     Google believes strongly in open
>                                     source and the exchange of
>                                     information
>                                     to
>                                     advance new ideas and work. Examples
>                                     of this commitment are active
>                                     OSS
>                                     projects such as Chromium
>                                     (https://www.chromium.org) and
>                                     Kubernetes
>                                     (
>                                     http://kubernetes.io/). With
>                                     Dataflow, we have tried to be
>                                     increasingly
>                                     open and forward-looking; we have
>                                     published a paper in the VLDB
>                                     conference
>                                     describing the Dataflow model (
>                                     http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf)
>                                     and were quick to
>                                     release
>                                     the Dataflow SDK as open source
>                                     software with the launch of Cloud
>                                     Dataflow.
>                                     Our submission to the Apache
>                                     Software Foundation is a logical
>                                     extension
>                                     of
>                                     our commitment to open source software.
>
>                                     === Homogeneous Developers ===
>
>                                     The majority of committers in this
>                                     proposal belong to Google due to
>                                     the
>                                     fact that Dataflow has emerged from
>                                     several internal Google projects.
>                                     This
>                                     proposal also includes committers
>                                     outside of Google who are actively
>                                     involved with other Apache projects,
>                                     such as Hadoop, Flink, and
>                                     Spark.
>                                     We
>                                     expect our entry into incubation
>                                     will allow us to expand the number
>                                     of
>                                     individuals and organizations
>                                     participating in Dataflow development.
>                                     Additionally, separation of the
>                                     Dataflow SDK from Google Cloud
>                                     Dataflow
>                                     allows us to focus on the open
>                                     source SDK and model and do what is
>                                     best
>                                     for
>                                     this project.
>
>                                     === Reliance on Salaried Developers ===
>
>                                     The Dataflow SDK and Dataflow
>                                     runners have been developed primarily
>                                     by
>                                     salaried developers supporting the
>                                     Google Cloud Dataflow project.
>                                     While
>                                     the
>                                     Dataflow SDK and Cloud Dataflow have
>                                     been developed by different
>                                     teams
>                                     (and
>                                     this proposal would reinforce that
>                                     separation) we expect our initial
>                                     set
>                                     of
>                                     developers will still primarily be
>                                     salaried. Contribution has not
>                                     been
>                                     exclusively from salaried
>                                     developers, however. For example, the
>                                     contrib
>                                     directory of the Dataflow SDK (
>
>
>
>                                     https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/master/contrib
>                                     )
>                                     contains items from free-time
>                                     contributors. Moreover, seperate
>                                     projects,
>                                     such as ScalaFlow
>                                     (https://github.com/darkjh/scalaflow) have
>                                     been
>                                     created
>                                     around the Dataflow model and SDK.
>                                     We expect our reliance on salaried
>                                     developers will decrease over time
>                                     during incubation.
>
>                                     === Relationship with other Apache
>                                     products ===
>
>                                     Dataflow directly interoperates with
>                                     or utilizes several existing
>                                     Apache
>                                     projects.
>
>                                     * Build
>
>                                     ** Apache Maven
>
>                                     * Data I/O, Libraries
>
>                                     ** Apache Avro
>
>                                     ** Apache Commons
>
>                                     * Dataflow runners
>
>                                     ** Apache Flink
>
>                                     ** Apache Spark
>
>                                     Dataflow when used in batch mode
>                                     shares similarities with Apache
>                                     Crunch;
>                                     however, Dataflow is focused on a
>                                     model, SDK, and abstraction layer
>                                     beyond
>                                     Spark and Hadoop (MapReduce.) One
>                                     key goal of Dataflow is to provide
>                                     an
>                                     intermediate abstraction layer which
>                                     can easily be implemented and
>                                     utilized
>                                     across several different processing
>                                     frameworks.
>
>                                     === An excessive fascination with
>                                     the Apache brand ===
>
>                                     With this proposal we are not
>                                     seeking attention or publicity. Rather,
>                                     we
>                                     firmly believe in the Dataflow
>                                     model, SDK, and the ability to make
>                                     Dataflow
>                                     a powerful yet simple framework for
>                                     data processing. While the
>                                     Dataflow
>                                     SDK
>                                     and model have been open source, we
>                                     believe putting code on GitHub
>                                     can
>                                     only
>                                     go so far. We see the Apache
>                                     community, processes, and mission as
>                                     critical
>                                     for ensuring the Dataflow SDK and
>                                     model are truly community-driven,
>                                     positively impactful, and innovative
>                                     open source software. While
>                                     Google
>                                     has
>                                     taken a number of steps to advance
>                                     its various open source projects,
>                                     we
>                                     believe Dataflow is a great fit for
>                                     the Apache Software Foundation
>                                     due
>                                     to
>                                     its focus on data processing and its
>                                     relationships to existing ASF
>                                     projects.
>
>                                     == Documentation ==
>
>                                     The following documentation is
>                                     relevant to this proposal. Relevant
>                                     portion
>                                     of the documentation will be
>                                     contributed to the Apache Dataflow
>                                     project.
>
>                                     * Dataflow website:
>                                     https://cloud.google.com/dataflow
>
>                                     * Dataflow programming model:
>                                     https://cloud.google.com/dataflow/model/programming-model
>
>                                     * Codebases
>
>                                     ** Dataflow Java SDK:
>                                     https://github.com/GoogleCloudPlatform/DataflowJavaSDK
>
>                                     ** Flink Dataflow runner:
>                                     https://github.com/dataArtisans/flink-dataflow
>
>                                     ** Spark Dataflow runner:
>                                     https://github.com/cloudera/spark-dataflow
>
>                                     * Dataflow Java SDK issue tracker:
>                                     https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues
>
>                                     * google-cloud-dataflow tag on Stack
>                                     Overflow:
>                                     http://stackoverflow.com/questions/tagged/google-cloud-dataflow
>
>                                     == Initial Source ==
>
>                                     The initial source for Dataflow
>                                     which we will submit to the Apache
>                                     Foundation will include several
>                                     related projects which are currently
>                                     hosted
>                                     on the GitHub repositories:
>
>                                     * Dataflow Java SDK (
>                                     https://github.com/GoogleCloudPlatform/DataflowJavaSDK)
>
>                                     * Flink Dataflow runner (
>                                     https://github.com/dataArtisans/flink-dataflow)
>
>                                     * Spark Dataflow runner
>                                     (https://github.com/cloudera/spark-dataflow)
>
>                                     These projects have always been
>                                     Apache 2.0 licensed. We intend to
>                                     bundle
>                                     all of these repositories since they
>                                     are all complimentary and should
>                                     be
>                                     maintained in one project. Prior to
>                                     our submission, we will combine
>                                     all
>                                     of
>                                     these projects into a new git
>                                     repository.
>
>                                     == Source and Intellectual Property
>                                     Submission Plan ==
>
>                                     The source for the Dataflow SDK and
>                                     the three runners (Spark, Flink,
>                                     Google
>                                     Cloud Dataflow) are already licensed
>                                     under an Apache 2 license.
>
>                                     * Dataflow SDK -
>
>
>
>                                     https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/LICENSE
>
>                                     * Flink runner -
>                                     https://github.com/dataArtisans/flink-dataflow/blob/master/LICENSE
>
>                                     * Spark runner -
>                                     https://github.com/cloudera/spark-dataflow/blob/master/LICENSE
>
>                                     Contributors to the Dataflow SDK
>                                     have also signed the Google
>                                     Individual
>                                     Contributor License Agreement (
>                                     https://cla.developers.google.com/about/google-individual)
>                                     in order
>                                     to
>                                     contribute to the project.
>
>                                     With respect to trademark rights,
>                                     Google does not hold a trademark on
>                                     the
>                                     phrase “Dataflow.” Based on feedback
>                                     and guidance we receive during
>                                     the
>                                     incubation process, we are open to
>                                     renaming the project if necessary
>                                     for
>                                     trademark or other concerns.
>
>                                     == External Dependencies ==
>
>                                     All external dependencies are
>                                     licensed under an Apache 2.0 or
>                                     Apache-compatible license. As we
>                                     grow the Dataflow community we will
>                                     configure our build process to
>                                     require and validate all contributions
>                                     and
>                                     dependencies are licensed under the
>                                     Apache 2.0 license or are under
>                                     an
>                                     Apache-compatible license.
>
>                                     == Required Resources ==
>
>                                     === Mailing Lists ===
>
>                                     We currently use a mix of mailing
>                                     lists. We will migrate our existing
>                                     mailing lists to the following:
>
>                                     * dev@dataflow.incubator.apache.org
>                                     <mailto:dev@dataflow.incubator.apache.org>
>
>                                     * user@dataflow.incubator.apache.org
>                                     <mailto:user@dataflow.incubator.apache.org>
>
>                                     *
>                                     private@dataflow.incubator.apache.org <mailto:private@dataflow.incubator.apache.org>
>
>                                     *
>                                     commits@dataflow.incubator.apache.org <mailto:commits@dataflow.incubator.apache.org>
>
>                                     === Source Control ===
>
>                                     The Dataflow team currently uses Git
>                                     and would like to continue to do
>                                     so.
>                                     We request a Git repository for
>                                     Dataflow with mirroring to GitHub
>                                     enabled.
>
>                                     === Issue Tracking ===
>
>                                     We request the creation of an
>                                     Apache-hosted JIRA. The Dataflow
>                                     project
>                                     is
>                                     currently using both a public GitHub
>                                     issue tracker and internal
>                                     Google
>                                     issue tracking. We will migrate and
>                                     combine from these two sources to
>                                     the
>                                     Apache JIRA.
>
>                                     == Initial Committers ==
>
>                                     * Aljoscha Krettek
>                                       [aljoscha@apache.org
>                                     <mailto:aljoscha@apache.org>]
>
>                                     * Amit Sela
>                                     [amitsela33@gmail.com
>                                     <mailto:amitsela33@gmail.com>]
>
>                                     * Ben Chambers
>                                       [bchambers@google.com
>                                     <mailto:bchambers@google.com>]
>
>                                     * Craig Chambers
>                                       [chambers@google.com
>                                     <mailto:chambers@google.com>]
>
>                                     * Dan Halperin
>                                       [dhalperi@google.com
>                                     <mailto:dhalperi@google.com>]
>
>                                     * Davor Bonaci
>                                       [davor@google.com
>                                     <mailto:davor@google.com>]
>
>                                     * Frances Perry
>                                     [fjp@google.com <mailto:fjp@google.com>]
>
>                                     * James Malone
>                                       [jamesmalone@google.com
>                                     <mailto:jamesmalone@google.com>]
>
>                                     * Jean-Baptiste Onofré
>                                     [jbonofre@apache.org
>                                     <mailto:jbonofre@apache.org>]
>
>                                     * Josh Wills
>                                       [jwills@apache.org
>                                     <mailto:jwills@apache.org>]
>
>                                     * Kostas Tzoumas
>                                       [kostas@data-artisans.com
>                                     <mailto:kostas@data-artisans.com>]
>
>                                     * Kenneth Knowles
>                                     [klk@google.com <mailto:klk@google.com>]
>
>                                     * Luke Cwik
>                                     [lcwik@google.com
>                                     <mailto:lcwik@google.com>]
>
>                                     * Maximilian Michels
>                                       [mxm@apache.org
>                                     <mailto:mxm@apache.org>]
>
>                                     * Stephan Ewen
>                                       [stephan@data-artisans.com
>                                     <mailto:stephan@data-artisans.com>]
>
>                                     * Tom White
>                                     [tom@cloudera.com
>                                     <mailto:tom@cloudera.com>]
>
>                                     * Tyler Akidau
>                                       [takidau@google.com
>                                     <mailto:takidau@google.com>]
>
>                                     == Affiliations ==
>
>                                     The initial committers are from six
>                                     organizations. Google developed
>                                     Dataflow and the Dataflow SDK, data
>                                     Artisans developed the Flink
>                                     runner,
>                                     and Cloudera (Labs) developed the
>                                     Spark runner.
>
>                                     * Cloudera
>
>                                     ** Tom White
>
>                                     * Data Artisans
>
>                                     ** Aljoscha Krettek
>
>                                     ** Kostas Tzoumas
>
>                                     ** Maximilian Michels
>
>                                     ** Stephan Ewen
>
>                                     * Google
>
>                                     ** Ben Chambers
>
>                                     ** Dan Halperin
>
>                                     ** Davor Bonaci
>
>                                     ** Frances Perry
>
>                                     ** James Malone
>
>                                     ** Kenneth Knowles
>
>                                     ** Luke Cwik
>
>                                     ** Tyler Akidau
>
>                                     * PayPal
>
>                                     ** Amit Sela
>
>                                     * Slack
>
>                                     ** Josh Wills
>
>                                     * Talend
>
>                                     ** Jean-Baptiste Onofré
>
>                                     == Sponsors ==
>
>                                     === Champion ===
>
>                                     * Jean-Baptiste Onofre
>                                     [jbonofre@apache.org
>                                     <mailto:jbonofre@apache.org>]
>
>                                     === Nominated Mentors ===
>
>                                     * Jim Jagielski
>                                       [jim@apache.org
>                                     <mailto:jim@apache.org>]
>
>                                     * Venkatesh Seetharam
>                                       [venkatesh@apache.org
>                                     <mailto:venkatesh@apache.org>]
>
>                                     * Bertrand Delacretaz
>                                       [bdelacretaz@apache.org
>                                     <mailto:bdelacretaz@apache.org>]
>
>                                     * Ted Dunning
>                                       [tdunning@apache.org
>                                     <mailto:tdunning@apache.org>]
>
>                                     === Sponsoring Entity ===
>
>                                     The Apache Incubator
>
>
>
>
>
>                                 --
>
>                             Jean-Baptiste Onofré
>                             jbonofre@apache.org <mailto:jbonofre@apache.org>
>                             http://blog.nanthrax.net
>                             Talend - http://www.talend.com
>
>                             ---------------------------------------------------------------------
>                             To unsubscribe, e-mail:
>                             general-unsubscribe@incubator.apache.org
>                             <mailto:general-unsubscribe@incubator.apache.org>
>                             For additional commands, e-mail:
>                             general-help@incubator.apache.org
>                             <mailto:general-help@incubator.apache.org>
>
>
>
>
>                         --
>
>                     Jean-Baptiste Onofré
>                     jbonofre@apache.org <mailto:jbonofre@apache.org>
>                     http://blog.nanthrax.net
>                     Talend - http://www.talend.com
>
>                     ---------------------------------------------------------------------
>                     To unsubscribe, e-mail:
>                     general-unsubscribe@incubator.apache.org
>                     <mailto:general-unsubscribe@incubator.apache.org>
>                     For additional commands, e-mail:
>                     general-help@incubator.apache.org
>                     <mailto:general-help@incubator.apache.org>
>
>
>
>
>             --
>             Jean-Baptiste Onofré
>             jbonofre@apache.org <mailto:jbonofre@apache.org>
>             http://blog.nanthrax.net
>             Talend - http://www.talend.com
>
>             ---------------------------------------------------------------------
>             To unsubscribe, e-mail:
>             general-unsubscribe@incubator.apache.org
>             <mailto:general-unsubscribe@incubator.apache.org>
>             For additional commands, e-mail:
>             general-help@incubator.apache.org
>             <mailto:general-help@incubator.apache.org>
>
>
>
>
>     --
>     Jean-Baptiste Onofré
>     jbonofre@apache.org <mailto:jbonofre@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>     <mailto:general-unsubscribe@incubator.apache.org>
>     For additional commands, e-mail: general-help@incubator.apache.org
>     <mailto:general-help@incubator.apache.org>
>
>
>
>
> --
> Thanks and Regards,
> Mayank
> Cell: 408-718-9370

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message