incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Ratnasekera <djkevincr1...@gmail.com>
Subject Re: [VOTE] Accept DolphinScheduler(was EasyScheduler) into Apache Incubator
Date Fri, 23 Aug 2019 09:14:04 GMT
+1

On Fri, Aug 23, 2019 at 7:09 AM Sheng Wu <wusheng@apache.org> wrote:

> Hi all,
>
> After the discussion of DolphinScheduler(was EasyScheduler) proposal
> (discussion thread:
>
> https://lists.apache.org/thread.html/d3ac53bddf91391e54f63d042a0b3d60f2aecfbb99780bcc00b4db6e@%3Cgeneral.incubator.apache.org%3E
> ),
> I would like to call a VOTE to accept it into the Apache Incubator.
>
> Please cast your vote:
>
>   [ ] +1, bring DolphinScheduler into Incubator
>   [ ] +0, I don't care either way
>   [ ] -1, do not bring DolphinScheduler into Incubator, because...
>
> The vote will open at least for 72 hours and only votes from the Incubator
> PMC are binding.
>
> ======
> Abstract
>
> DolphinScheduler is a distributed ETL scheduling engine with powerful DAG
> visualization interface. DolphinScheduler focuses on solving the problem of
> 'complex task dependencies & triggers' in data processing. Just like its
> name, we dedicated to making the scheduling system out of the box.
>
> *Current project name of DolphinScheduler is EasyScheduler, will change it
> after it is accepted by Incubator.*
> Proposal
>
> DolphinScheduler provides many easy-to-use features to accelerate
> the engineering efficiency on data ETL workflow job. We propose a new
> concept of 'instance of process' and 'instance of task' to let developers
> to tuning their jobs on the running state of workflow instead of changing
> the task's template. Its main objectives are as follows:
>
>    - Define the complex tasks' dependencies & triggers in a DAG graph by
>    dragging and dropping.
>    - Support cluster HA.
>    - Support multi-tenant and parallel or serial backfilling data.
>    - Support automatical failure job retry and recovery.
>    - Support many data task types and process priority, task priority and
>    relative task timeout alarm.
>
> For now, DolphinScheduler has a fairly huge community in China. It is also
> widely adopted by many companies and organizations
> <https://github.com/analysys/EasyScheduler/issues/57> as its ETL
> scheduling
> tool.
>
> We believe that bringing DolphinScheduler into ASF could advance
> development of a much more stronger and more diverse open source community.
>
> Analysys submits this proposal to donate DolphinScheduler's source codes
> and all related documentations to Apache Software Foundation. The codes are
> already under Apache License Version 2.0.
>
>    - Code base: https://www.github.com/analysys/easyscheduler
>    - English Documentations: https://analysys.github.io/easyscheduler_docs
>    - Chinese Documentations:
>    https://analysys.github.io/easyscheduler_docs_cn
>
> Background
>
> We want to find a data processing tool with the following features:
>
>    - Easy to use,developers can build a ETL process with a very simple drag
>    and drop operation. not only for ETL developers,people who can't write
> code
>    also can use this tool for ETL operation such as system adminitrator.
>    - Solving the problem of "complex task dependencies" , and it can
>    monitor the ETL running status.
>    - Support multi-tenant.
>    - Support many task types: Shell, MR, Spark, SQL (mysql, postgresql,
>    hive, sparksql), Python, Sub_Process, Procedure, etc.
>    - Support HA and linear scalability.
>
> For the above reasons, we realized that no existing product met our
> requirements, so we decided to develop this tool ourselves. We designed
> DolphinScheduler at the end of 2017. The first internal use version was
> completed in May 2018. We then iterated several internal versions and the
> system gradually became stabilized.
>
> Then we open the source code of DolphinScheduler on March 2019. It soon
> gained lot's of ETL developers interest and stars on github.
> Rationale
>
> Many organizations (>30) (refer to Who is using DolphinScheduler
> <https://github.com/analysys/EasyScheduler/issues/57> ) already benefit
> from running DolphinScheduler to make data process pipelines more easier.
> More than 100 feature ideas
> <https://github.com/analysys/EasyScheduler/projects/1> come from
> DolphinScheduler community. Some 3rd-party projects also plan to integrate
> with DolphinScheduler through task plugin, such as Scriptis
> <https://github.com/WeBankFinTech/Scriptis>, waterdrop
> <https://github.com/InterestingLab/waterdrop>. These will strengthen the
> features of DolphinScheduler.
> Current StatusMeritocracy
>
> DolphinScheduler was incubated at Analysys in 2017 and open sourced on
> GitHub in March 2019. Once open sourced, we have been quickly adopted by
> multiple organizations,DolphinScheduler has contributors and users from
> many companies; we have set up the Committer Team. New contributors are
> guided and reviewed by existed committer members. Contributions are always
> welcomed and highly valued.
> Community
>
> Now we have set development teams for DolphinScheduler in Analysys, and we
> already have external developers who contributed the code. We already have
> a user group of more than 1,000 people. We hope to grow the base of
> contributors by inviting all those who offer contributions through The
> Apache Way. Right now, we make use of github as code hosting as well as
> gitter for community communication.
> Core Developers
>
> The core developers, including experienced senior developers, are often
> guided by mentors.
> Known RisksOrphaned products
>
> DolphinScheduler is widely adopted in China by many companies and
> organizations <https://github.com/analysys/EasyScheduler/issues/57>. The
> core developers of DolphinScheduler team plan to work full time on this
> project. Currently there are 10 use cases with more that 1000 activity
> tasks per day using DolphinScheduler in the user's production environment.
> There is very little risk of DolphinScheduler getting orphaned as at least
> two large companies (xueqiu、fengjr) are widely using it in their
> production, and developers from these companies have also joined Easy
> Scheduler's team of contributors, DolphinScheduler has eight major releases
> so far, and and received 373 pull requests from contributors, which further
> demonstrates DolphinScheduler as a very active project. We also plan to
> extend and diversify this community further through Apache.
>
> Thus, it is very unlikely that DolphinScheduler becomes orphaned.
> Inexperience with Open Source
>
> DolphinScheduler's core developers have been running it as a
> community-oriented open source project for some time, several of them
> already have experience working with open source communities, they are also
> active in presto, alluxio and other projects. At the same time, we will
> learn more open source experiences by following the Apache way in our
> incubator journey.
> Homogenous Developers
>
> The current developers work across a variety of organizations including
> Analysys, guandata and hydee; some individual developers are accepted as
> developers of DolphinScheduler as well. Considering that fengjr and
> sefonsoft have shown great interests in DolphinScheduler, we plan to
> encourage them to contribute and invite them as contributors to work
> together.
> Reliance on Salaried Developers
>
> At present, eight of the core developers are paid by their employer to
> contribute to DolphinScheduler project. we also have some other developers
> and researchers taking part in the project, and we will make efforts to
> increase the diversity of the contributors and actively lobby for Domain
> experts in the workflow space to contribute.
> Relationships with Other Apache Products
>
> DolphinScheduler integrates Apache Zookeeper as one of the service
> registration/discovery mechanisms. DolphinScheduler is deeply integrated
> with Apache products. It currently support many task types like Apache
> Hive, Apache Spark, Apache Hadoop, and so on
> A Excessive Fascination with the Apache Brand
>
> We recognize the value and reputation that the Apache brand will bring to
> DolphinScheduler. However, we prefer that the community provided by the
> Apache Software Foundation will enable the project to achieve long-term
> stable development. so DolphinScheduler is proposing to enter incubation at
> Apache in order to help efforts to diversify the community, not so much to
> capitalize on the Apache brand.
> Documentation
>
> A complete set of DolphinScheduler documentations is provided on github in
> both English and Simplified Chinese.
>
>    - English <https://github.com/analysys/easyscheduler_docs>
>    - Chinese <https://github.com/analysys/easyscheduler_docs_cn>
>
> Initial Source
>
> The project consists of three distinct codebases: core and document. The
> address of two existed git repositories are as follows:
>
>    - https://github.com/analysys/easyscheduler
>    - https://github.com/analysys/easyscheduler_docs
>    - https://github.com/analysys/easyscheduler_docs_cn
>
> Source and Intellectual Property Submission Plan
>
> As soon as DolphinScheduler is approved to join Apache Incubator, Analysys
> will provide the Software Grant Agreement(SGA) and intial committers will
> submit ICLA(s). The code is already licensed under the Apache Software
> License, version 2.0.
>
>
> External Dependencies
>
> As all backend code dependencies are managed using Apache Maven, none of
> the external libraries need to be packaged in a source distribution.
>
> Most of dependencies have Apache compatible licenses,and the detail as
> follows:
>
> Most of dependencies have Apache compatible licenses,and the core
> dependencies are as follows:
> Backend Dependency
> Dependency
> License
> Comments
> bonecp-0.8.0.RELEASE.jar Apache v2.0
> byte-buddy-1.9.10.jar Apache V2.0
> c3p0-0.9.1.1.jar GNU LESSER GENERAL PUBLIC LICENSE will remove
> curator-*-2.12.0.jar Apache V2.0
> druid-1.1.14.jar Apache V2.0
> fastjson-1.2.29.jar Apache V2.0
> fastutil-6.5.6.jar Apache V2.0
> grpc-*-1.9.0.jar Apache V2.0
> gson-2.8.5.jar Apache V2.0
> guava-20.0.jar Apache V2.0
> guice-*3.0.jar Apache V2.0
> hadoop-*-2.7.3.jar Apache V2.0
> hbase-*-1.1.1.jar Apache V2.0
> hive-*-2.1.0.jar Apache V2.0
> instrumentation-api-0.4.3.jar Apache V2.0
> jackson-*-2.9.8.jar Apache V2.0
> jackson-jaxrs-1.8.3.jar LGPL Version 2.1 Apache V2.0 will remove
> jackson-xc-1.8.3.jar LGPL Version 2.1 Apache V2.0 will remove
> javax.activation-api-1.2.0.jar CDDL/GPLv2+CE will remove
> javax.annotation-api-1.3.2.jar CDDL + GPLv2 with classpath exception will
> remove
> javax.servlet-api-3.1.0.jar CDDL + GPLv2 with classpath exception will
> remove
> jaxb-*.jar (CDDL 1.1) (GPL2 w/ CPE) will remove
> jersey-*-1.9.jar CDDL+GPLv2 will remove
> jetty-*-9.4.14.v20181114.jar Apache V2.0,EPL 1.0
> jna-4.5.2.jar Apache V2.0,LGPL 2.1 will remove
> jna-platform-4.5.2.jar Apache V2.0,LGPL 2.1 will remove
> jsp-api-2.x.jar CDDL,GPL 2.0 will remove
> log4j-1.2.17.jar Apache V2.0
> log4j-*-2.11.2.jar Apache V2.0
> logback-x.jar dual-license EPL 1.0,LGPL 2.1
> mail-1.4.5.jar CDDL+GPLv2 will remove
> mybatis-3.5.1.jar Apache V2.0
> mybatis-spring-*2.0.1.jar Apache V2.0
> mysql-connector-java-5.1.34.jar GPL 2.0 will remove
> netty-*-4.1.33.Final.jar Apache V2.0
> oshi-core-3.5.0.jar EPL 1.0
> parquet-hadoop-bundle-1.8.1.jar Apache V2.0
> postgresql-42.1.4.jar BSD 2-clause
> protobuf-java-*3.5.1.jar BSD 3-clause
> quartz-2.2.3.jar Apache V2.0
> quartz-jobs-2.2.3.jar Apache V2.0
> slf4j-api-1.7.5.jar MIT
> spring-*-5.1.5.RELEASE.jar Apache V2.0
> spring-beans-5.1.5.RELEASE.jar Apache V2.0
> spring-boot-*2.1.3.RELEASE.jar Apache V2.0
> springfox-*-2.9.2.jar Apache V2.0
> stringtemplate-3.2.1.jar BSD
> swagger-annotations-1.5.20.jar Apache V2.0
> swagger-bootstrap-ui-1.9.3.jar Apache V2.0
> swagger-models-1.5.20.jar Apache V2.0
> zookeeper-3.4.8.jar Apache
>
> The front-end UI currently relies on many components, and the core
> dependencies are as follows:
> UI Dependency
> Dependency
> License
> Comments
> autoprefixer MIT
> babel-core MIT
> babel-eslint MIT
> babel-helper-* MIT
> babel-helpers MIT
> babel-loader MIT
> babel-plugin-syntax-* MIT
> babel-plugin-transform-* MIT
> babel-preset-env MIT
> babel-runtime MIT
> bootstrap MIT
> canvg MIT
> clipboard MIT
> codemirror MIT
> copy-webpack-plugin MIT
> cross-env MIT
> css-loader MIT
> cssnano MIT
> cyclist MIT
> d3 BSD-3-Clause
> dayjs MIT
> echarts Apache V2.0
> env-parse ISC
> extract-text-webpack-plugin MIT
> file-loader MIT
> globby MIT
> html-loader MIT
> html-webpack-ext-plugin MIT
> html-webpack-plugin MIT
> html2canvas MIT
> jsplumb (MIT OR GPL-2.0)
> lodash MIT
> node-sass MIT
> optimize-css-assets-webpack-plugin MIT
> postcss-loader MIT
> rimraf ISC
> sass-loader MIT
> uglifyjs-webpack-plugin MIT
> url-loader MIT
> util.promisify MIT
> vue MIT
> vue-loader MIT
> vue-style-loader MIT
> vue-template-compiler MIT
> vuex-router-sync MIT
> watchpack MIT
> webpack MIT
> webpack-dev-server MIT
> webpack-merge MIT
> xmldom MIT,LGPL will remove
>
> Required ResourcesGit Repositories
>
>    - https://github.com/analysys/EasyScheduler.git
>    - https://github.com/analysys/easyscheduler_docs.git
>    - https://github.com/analysys/easyscheduler_docs_cn.git
>
> Issue Tracking
>
> The community would like to continue using GitHub Issues.
> Continuous Integration tool
>
> Jenkins
> Mailing Lists
>
>    - DolphinScheduler-dev: for development discussions
>    - DolphinScheduler-private: for PPMC discussions
>    - DolphinScheduler-notifications: for users notifications
>
>
> Initial Committers
>
>    - William-GuoWei(guowei20m@outlook.com)
>    - Lidong Dai(lidong.dai@outlook.com)
>    - Zhanwei Qiao(qiaozhanwei@outlook.com)
>    - Liang Bao(baoliang.leon@gmail.com)
>    - Gang Li(lgcareer2019@outlook.com)
>    - Zijian Gong(quanquansy@gmail.com)
>    - Jun Gao(gaojun2048@gmail.com)
>    - Baoqi Wu(wubaoqi@gmail.com)
>
> Affiliations
>
>    -
>
>    Analysys Inc: William-GuoWei,Zhanwei Qiao,Liang Bao,Gang Li,Jun
>    Gao,Lidong Dai
>    -
>
>    Hydee Inc: Zijian Gong
>    -
>
>    Guandata Inc: Baoqi Wu
>
> SponsorsChampion
>
>    - Sheng Wu ( Apache Incubator PMC, wusheng@apache.org)
>
> Mentors
>
>    -
>
>    Sheng Wu ( Apache Incubator PMC, wusheng@apache.org)
>    -
>
>    ShaoFeng Shi ( Apache Incubator PMC, shaofengshi@apache.org
>    <wusheng@apache.org>)
>    -
>
>    Liang Chen ( Apache Incubator PMC, Apache member,
> chenliang613@apache.org
>    )
>    - Furkan KAMACI ( Apache Incubator PMC, kamaci@apache.org)
>    - Kevin Ratnasekera ( Apache Incubator PMC, Apache member,
>    djkevincr@apache.org)
>
> Sponsoring Entity
>
> We are expecting the Apache Incubator could sponsor this project.
>
> Sheng Wu 吴晟
>
> Apache SkyWalking, ShardingSphere, Zipkin
> Twitter, wusheng1108
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message