incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Fisher <dave2w...@comcast.net>
Subject Re: [VOTE] Accept Doris into the Apache Incubator
Date Thu, 05 Jul 2018 19:23:17 GMT
Here is my +1 (binding)

> On Jul 5, 2018, at 12:22 PM, Dave Fisher <dave2wave@comcast.net> wrote:
> 
> Hi All,
> 
> I would like to start a VOTE to bring the Doris project as an Apache incubator podling.
> 
> The ASF voting rules are described:
> 
> https://www.apache.org/foundation/voting.html <https://www.apache.org/foundation/voting.html>
> 
> A vote for accepting a new Apache Incubator podling is a majority vote for which only
Incubator PMC member votes are binding.
> 
> This vote will run for at least 72 hours. Please VOTE as follows
> [] +1 Accept Doris into the Apache Incubator
> [] +0 Abstain.
> [] -1 Do not accept Doris into the Apache Incubator because ...
> 
> The proposal is listed below, but you can also access it on the wiki:
> 
> https://wiki.apache.org/incubator/DorisProposal <https://wiki.apache.org/incubator/DorisProposal>
> 
> Best regards,
> Dave
> 
> = Apache Doris =
> 
> == Abstract ==
> 
> Doris is a MPP-based interactive SQL data warehousing for reporting and analysis.
> 
> == Proposal ==
> 
> We propose to contribute the Doris codebase and associated artifacts (e.g. documentation,
web-site content etc.) to the Apache Software Foundation, and aim to build an open community
around Doris’s continued development in the ‘Apache Way’.
> 
> === Overview of Doris ===
> 
> Doris’s implementation consists of two daemons: Frontend (FE) and Backend (BE).
> 
> **Frontend daemon** consists of query coordinator and catalog manager. Query coordinator
is responsible for receiving users’ sql queries, compiling queries and managing queries
execution. Catalog manager is responsible for managing metadata such as databases, tables,
partitions, replicas and etc. Several frontend daemons could be deployed to guarantee fault-tolerance,
and load balancing.
> 
> **Backend daemon** stores the data and executes the query fragments. Many backend daemons
could also be deployed to provide scalability and fault-tolerance.
> 
> A typical Doris cluster generally composes of several frontend daemons and dozens to
hundreds of backend daemons.
> 
> Users can use MySQL client tools to connect any frontend daemon to submit SQL query.
Frontend receives the query and compiles it into query plans executable by the Backend. Then
Frontend sends the query plan fragments to Backend. Backend will build a query execution DAG.
Data is fetched and pipelined into the DAG. The final result response is sent to client via
Frontend. The distribution of query fragment execution takes minimizing data movement and
maximizing scan locality as the main goal.
> 
> == Background ==
> 
> At Baidu, Prior to Doris, different tools were deployed to solve diverse requirements
in many ways. And when a use case requires the simultaneous availability of capabilities that
cannot all be provided by a single tool, users were forced to build hybrid architectures that
stitch multiple tools together, but we believe that they shouldn’t need to accept such inherent
complexity. A storage system built to provide great performance across a broad range of workloads
provides a more elegant solution to the problems that hybrid architectures aim to solve. Doris
is the solution.
> 
> Doris is designed to be a simple and single tightly coupled system, not depending on
other systems. Doris provides high concurrent low latency point query performance, but also
provides high throughput queries of ad-hoc analysis. Doris provides bulk-batch data loading,
but also provides near real-time mini-batch data loading. Doris also provides high availability,
reliability, fault tolerance, and scalability.
> 
> == Rationale ==
> 
> Doris mainly integrates the technology of Google Mesa and Apache Impala.
> 
> Mesa is a highly scalable analytic data storage system that stores critical measurement
data related to Google's Internet advertising business. Mesa is designed to satisfy complex
and challenging set of users’ and systems’ requirements, including near real-time data
ingestion and query ability, as well as high availability, reliability, fault tolerance, and
scalability for large data and query volumes.
> 
> Impala is a modern, open-source MPP SQL engine architected from the ground up for the
Hadoop data processing environment. At present, by virtue of its superior performance and
rich functionality, Impala has been comparable to many commercial MPP database query engine.
Mesa can satisfy the needs of many of our storage requirements, however Mesa itself does not
provide a SQL query engine; Impala is a very good MPP SQL query engine, but the lack of a
perfect distributed storage engine. So in the end we chose the combination of these two technologies.
> 
> Learning from Mesa’s data model, we developed a distributed storage engine. Unlike
Mesa, this storage engine does not rely on any distributed file system. Then we deeply integrate
this storage engine with Impala query engine. Query compiling, query execution coordination
and catalog management of storage engine are integrated to be frontend daemon; query execution
and data storage are integrated to be backend daemon. With this integration, we implemented
a single, full-featured, high performance state the art of MPP database, as well as maintaining
the simplicity.
> 
> == Current Status ==
> 
> Doris has been an open source project on GitHub (https://github.com/baidu/palo <https://github.com/baidu/palo>).
> 
> === Meritocracy ===
> 
> Doris has been deployed in production at Baidu and is applying more than 200 lines of
business. It has demonstrated great performance benefits and has proved to be a better way
for reporting and analysis based big data. Still We look forward to growing a rich user and
developer community.
> 
> === Community ===
> 
> Doris seeks to develop developer and user communities during incubation.
> 
> Doris makes use of Apache Impala. It was identified during early review of the proposal
that the Doris community will need to work with Impala to define a suitable API.
> 
> === Core Developers ===
> 
>  * Ruyue Ma (https://github.com/maruyue <https://github.com/maruyue>, maruyue@baidu
dot com)
>  * Chun Zhao (https://github.com/imay <https://github.com/imay>, buaa.zhaoc@gmail
dot com)
>  * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu <https://github.com/morningman,chenmingyu@baidu>
dot com)
>  * De Li(https://github.com/lide-reed <https://github.com/lide-reed>, mailtolide@sina
dot com)
>  * Hao Chen (https://github.com/chenhao7253886 <https://github.com/chenhao7253886>,
chenhao16@baidu dot com)
>  * Chaoyong Li (https://github.com/cyongli <https://github.com/cyongli>, lichaoyong@baidu
dot com)
>  * Bin Lin (https://github.com/lingbin <https://github.com/lingbin>, lingbinlb@gmail
dot com)
> 
> === Alignment ===
> 
> Doris is related to several other Apache projects:
> 
>  * Doris can also read data stored in Apache Hadoop clusters powered by the HDFS filesystem.
>  * Doris is closely integrated with Impala, which has graduated from Apache Incubator.
>  * Doris uses Apache Thrift as its RPC and serialization framework of choice.
> 
> == Known Risks ==
> 
> === Orphaned Products ===
> 
> The core developers of Doris team plan to work full time on this project. There is very
little risk of Doris getting orphaned since at least one large company (Baidu) is extensively
using it in their production. For example, currently there are more than 200 use cases using
Doris in production. Furthermore, since Doris was open sourced at the beginning of October
2017, it has received more than 660 stars and been forked nearly 170 times. We plan to extend
and diversify this community further through Apache.
> 
> === Inexperience with Open Source ===
> 
> The core developers are all active users and followers of open source. They are already
committers and contributors to the Doris Github project. All have been involved with the source
code that has been released under an open source license, and several of them also have experience
developing code in an open source environment. Though the core set of Developers do not have
Apache Open Source experience, there are plans to onboard individuals with Apache open source
experience on to the project.
> 
> === Homogenous Developers ===
> 
> The most of core developers are from Baidu, but after Doris was open sourced, Doris received
a lot of bug fixes and enhancements from other developers not working at Baidu.
> 
> === Reliance on Salaried Developers ===
> 
> Baidu invested in Doris as the OLAP solution and some of its key engineers are working
full time on the project. In addition, since there is a growing Big Data need for scalable
OLAP solutions, we look forward to other Apache developers and researchers to contribute to
the project. Also key to addressing the risk associated with relying on Salaried developers
from a single entity is to increase the diversity of the contributors and actively lobby for
Domain experts in the BI space to contribute. Apache Doris intends to do this.
> 
> === An Excessive Fascination with the Apache Brand ===
> 
> Doris is proposing to enter incubation at Apache in order to help efforts to diversify
the committer-base, not so much to capitalize on the Apache brand. The Doris project is in
production use already inside Baidu, but is not expected to be an Baidu product for external
customers. As such, the Doris project is not seeking to use the Apache brand as a marketing
tool.
> 
> == Documentation ==
> 
> Information about Doris can be found at https://github.com/baidu/palo <https://github.com/baidu/palo>.
The following links provide more information about Doris in open source:
> 
>  * Doris wiki site: https://github.com/baidu/palo/wiki <https://github.com/baidu/palo/wiki>
>  * Codebase at Github: https://github.com/baidu/palo <https://github.com/baidu/palo>
>  * Issue Tracking: https://github.com/baidu/palo/issues <https://github.com/baidu/palo/issues>
>  * Overview: https://github.com/baidu/Doris/wiki/palo-Overview <https://github.com/baidu/Doris/wiki/palo-Overview>
>  * FAQ: https://github.com/baidu/palo/wiki/palo-FAQ <https://github.com/baidu/palo/wiki/palo-FAQ>
> 
> == Initial Source ==
> 
> Doris has been under development since 2017 by a team of engineers at Baidu Inc. It is
currently hosted on Github.com <http://github.com/> under an Apache license at https://github.com/baidu/palo
<https://github.com/baidu/palo>.
> 
> == External Dependencies ==
> 
> Doris has the following external dependencies.
> 
>  * Google gflags (BSD)
>  * Google glog (BSD)
>  * Apache Thrift (Apache Software License v2.0)
>  * Apache Commons (Apache Software License v2.0)
>  * Boost (Boost Software License)
>  * rapidjson (Tencent)
>  * Google RE2 (BSD-style)
>  * lz4 (BSD)
>  * snappy (BSD)
>  * Twitter Bootstrap (Apache Software License v2.0)
>  * d3 (BSD)
>  * LLVM (BSD-like)
> 
> Build and test dependencies:
> 
>  * Apache Ant (Apache Software License v2.0)
>  * Apache Maven (Apache Software License v2.0)
>  * cmake (BSD)
>  * clang (BSD)
>  * Google gtest (Apache Software License v2.0)
> 
> == Required Resources ==
> 
> === Mailing List ===
> 
> There are currently no mailing lists. The usual mailing lists are expected to be set
up when entering incubation:
> 
>  * private@doris.incubator.apache.org <mailto:private@doris.incubator.apache.org>
>  * dev@doris.incubator.apache.org <mailto:dev@doris.incubator.apache.org>
>  * commits@doris.incubator.apache.org <mailto:commits@doris.incubator.apache.org>
> 
> === Subversion Directory ===
> 
> Upon entering incubation, we want to move (or copy) the existing repo from https://github.com/baidu/palo
<https://github.com/baidu/palo> to Apache infrastructure at https://github.com/apache/incubator-doris
<https://github.com/apache/incubator-doris>.
> 
> === Issue Tracking ===
> 
> Doris currently uses GitHub to track issues. Would like to continue to do so while we
discuss migration possibilities with the ASF Infra committee.
> 
> === Other Resources ===
> 
> The existing code already has unit tests so we will make use of existing Apache continuous
testing infrastructure. The resulting load should not be very large.
> 
> == Initial Committers ==
> 
>  * Ruyue Ma (https://github.com/maruyue <https://github.com/maruyue>, maruyue@baidu
dot com)
>  * Chun Zhao (https://github.com/imay <https://github.com/imay>, buaa.zhaoc@gmail
dot com)
>  * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu <https://github.com/morningman,chenmingyu@baidu>
dot com)
>  * De Li(https://github.com/lide-reed <https://github.com/lide-reed>, mailtolide@sina
dot com)
>  * Hao Chen (https://github.com/chenhao7253886 <https://github.com/chenhao7253886>,
chenhao16@baidu dot com)
>  * Chaoyong Li (https://github.com/cyongli <https://github.com/cyongli>, lichaoyong@baidu
dot com)
>  * Bin Lin (https://github.com/lingbin <https://github.com/lingbin>, lingbinlb@gmail
dot com)
>  * Sijie Guo (guosijie@gmail dot com)
>  * Zheng Shao (zshao@apache.org <mailto:zshao@apache.org>)
> 
> == Affiliations ==
> 
> The initial committers are employees of Baidu Inc..
> 
> == Sponsors ==
> 
> === Champion ===
> 
>  * Dave Fisher, wave@apache.org <mailto:wave@apache.org>
> 
> === Nominated Mentors ===
> 
>  * Luke Han, lukehan@apache.org <mailto:lukehan@apache.org>
>  * Dave Fisher, wave@apache.org <mailto:wave@apache.org>
>  * Willem Jiang, ningjiang@apache.org <mailto:ningjiang@apache.org>
> 
> === Sponsoring Entity ===
> 
> We are requesting the Incubator to sponsor this project.
> 


Mime
View raw message