incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Willem Jiang <willem.ji...@gmail.com>
Subject Re: Looking for Champion
Date Fri, 08 Jun 2018 15:03:23 GMT
Hi,

I'm willing to be the Mentor.
Please count me in.



Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <dave2wave@comcast.net> wrote:

> Hi -
>
> I’m willing to Champion and Mentor. I have a couple of comments inline.
> I’ll look at dependency licenses later today. It’s early for me.
>
>
> > On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <lide@baidu.com> wrote:
> >
> > Hi all,
> >
> > I am Reed, as a developer worked with the team for Palo (a MPP-based
> interactive SQL data warehousing).
> > https://github.com/baidu/palo/wiki/Palo-Overview
> >
> > We propose to contribute Palo as an Apache Incubator project, and
> > we are still looking for possible Champion if anyone would like to
> volunteer. Thanks a lot.
> >
> > Best Regards,
> > Reed
> >
> > ===================
> > The draft of the proposal as below:
> >
> > #Apache Palo
> >
> > ##Abstract
> >
> > Palo is a MPP-based interactive SQL data warehousing for reporting and
> analysis.
> >
> > ##Proposal
> >
> > We propose to contribute the Palo codebase and associated artifacts
> (e.g. documentation, web-site content etc.) to the Apache Software
> Foundation with the intent of forming a productive, meritocratic and open
> community around Palo’s continued development, according to the ‘Apache
> Way’.
> >
> > Baidu owns several trademarks regarding Palo, and proposes to transfer
> ownership of those trademarks in full to the ASF.
> >
> > ###Overview of Palo
> >
> > Palo’s implementation consists of two daemons: Frontend (FE) and Backend
> (BE).
> >
> > **Frontend daemon** consists of query coordinator and catalog manager.
> Query coordinator is responsible for receiving users’ sql queries,
> compiling queries and managing queries execution. Catalog manager is
> responsible for managing metadata such as databases, tables, partitions,
> replicas and etc. Several frontend daemons could be deployed to guarantee
> fault-tolerance, and load balancing.
> >
> > **Backend daemon** stores the data and executes the query fragments.
> Many backend daemons could also be deployed to provide scalability and
> fault-tolerance.
> >
> > A typical Palo cluster generally composes of several frontend daemons
> and dozens to hundreds of backend daemons.
> >
> > Users can use MySQL client tools to connect any frontend daemon to
> submit SQL query. Frontend receives the query and compiles it into query
> plans executable by the Backend. Then Frontend sends the query plan
> fragments to Backend. Backend will build a query execution DAG. Data is
> fetched and pipelined into the DAG. The final result response is sent to
> client via Frontend. The distribution of query fragment execution takes
> minimizing data movement and maximizing scan locality as the main goal.
> >
> > ##Background
> >
> > At Baidu, Prior to Palo, different tools were deployed to solve diverse
> requirements in many ways. And when a use case requires the simultaneous
> availability of capabilities that cannot all be provided by a single tool,
> users were forced to build hybrid architectures that stitch multiple tools
> together, but we believe that they shouldn’t need to accept such inherent
> complexity. A storage system built to provide great performance across a
> broad range of workloads provides a more elegant solution to the problems
> that hybrid architectures aim to solve. Palo is the solution.
> >
> > Palo is designed to be a simple and single tightly coupled system, not
> depending on other systems. Palo provides high concurrent low latency point
> query performance, but also provides high throughput queries of ad-hoc
> analysis. Palo provides bulk-batch data loading, but also provides near
> real-time mini-batch data loading. Palo also provides high availability,
> reliability, fault tolerance, and scalability.
> >
> > ##Rationale
> >
> > Palo mainly integrates the technology of Google Mesa and Apache Impala.
> >
> > Mesa is a highly scalable analytic data storage system that stores
> critical measurement data related to Google's Internet advertising
> business. Mesa is designed to satisfy complex and challenging set of users’
> and systems’ requirements, including near real-time data ingestion and
> query ability, as well as high availability, reliability, fault tolerance,
> and scalability for large data and query volumes.
> >
> > Impala is a modern, open-source MPP SQL engine architected from the
> ground up for the Hadoop data processing environment. At present, by virtue
> of its superior performance and rich functionality, Impala has been
> comparable to many commercial MPP database query engine. Mesa can satisfy
> the needs of many of our storage requirements, however Mesa itself does not
> provide a SQL query engine; Impala is a very good MPP SQL query engine, but
> the lack of a perfect distributed storage engine. So in the end we chose
> the combination of these two technologies.
> >
> > Learning from Mesa’s data model, we developed a distributed storage
> engine. Unlike Mesa, this storage engine does not rely on any distributed
> file system. Then we deeply integrate this storage engine with Impala query
> engine. Query compiling, query execution coordination and catalog
> management of storage engine are integrated to be frontend daemon; query
> execution and data storage are integrated to be backend daemon. With this
> integration, we implemented a single, full-featured, high performance state
> the art of MPP database, as well as maintaining the simplicity.
> >
> > ##Current Status
> >
> > Palo has been an open source project on GitHub (
> https://github.com/baidu/palo).
> >
> > ###Meritocracy
> >
> > Palo has been deployed in production at Baidu and is applying more than
> 200 lines of business. It has demonstrated great performance benefits and
> has proved to be a better way for reporting and analysis based big data.
> Still We look forward to growing a rich user and developer community.
> >
> > ###Community
> >
> > Palo seeks to develop developer and user communities during incubation.
> >
> > ###Core Developers
> >
> > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
> ue@baidu.com>)
> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
> aa.zhaoc@gmail.com>)
> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:ma
> iltolide@sina.com%EF%BC%89>
> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
> <mailto:chenhao16@baidu.com>)
> > * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
> lichaoyong@baidu.com>)
> > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
> gbinlb@gmail.com>)
> >
> > ###Alignment
> >
> > Palo is related to several other Apache projects:
> >
> > * Palo can also read data stored in Apache Hadoop clusters powered by
> the HDFS filesystem.
> > * Palo is closely integrated with Impala, which is also being proposed
> to the Incubator.
>
> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>
> > * Palo uses Apache Thrift as its RPC and serialization framework of
> choice.
> >
> > ##Known Risks
> >
> > ###Orphaned Products
> >
> > The core developers of Palo team plan to work full time on this project.
> There is very little risk of Palo getting orphaned since at least one large
> company (Baidu) is extensively using it in their production. For example,
> currently there are more than 200 use cases using Palo in production.
> Furthermore, since Palo was open sourced at the beginning of October 2017,
> it has received more than 660 stars and been forked nearly 170 times. We
> plan to extend and diversify this community further through Apache.
> >
> > ###Inexperience with Open Source
> >
> > The core developers are all active users and followers of open source.
> They are already committers and contributors to the Palo Github project.
> All have been involved with the source code that has been released under an
> open source license, and several of them also have experience developing
> code in an open source environment. Though the core set of Developers do
> not have Apache Open Source experience, there are plans to onboard
> individuals with Apache open source experience on to the project.
> >
> > ###Homogenous Developers
> >
> > The most of core developers are from Baidu, but after Palo was open
> sourced, Palo received a lot of bug fixes and enhancements from other
> developers not working at Baidu.
> >
> > ###Reliance on Salaried Developers
> >
> > Baidu invested in Palo as the OLAP solution and some of its key
> engineers are working full time on the project. In addition, since there is
> a growing Big Data need for scalable OLAP solutions, we look forward to
> other Apache developers and researchers to contribute to the project. Also
> key to addressing the risk associated with relying on Salaried developers
> from a single entity is to increase the diversity of the contributors and
> actively lobby for Domain experts in the BI space to contribute. Apache
> Palo intends to do this.
> >
> > ###An Excessive Fascination with the Apache Brand
> >
> > Palo is proposing to enter incubation at Apache in order to help efforts
> to diversify the committer-base, not so much to capitalize on the Apache
> brand. The Palo project is in production use already inside Baidu, but is
> not expected to be an Baidu product for external customers. As such, the
> Palo project is not seeking to use the Apache brand as a marketing tool.
> >
> > ##Documentation
> >
> > Information about Palo can be found at https://github.com/baidu/palo.
> The following links provide more information about Palo in open source:
> >
> > * Palo wiki site: https://github.com/baidu/palo/wiki
> > * Codebase at Github: https://github.com/baidu/palo
> > * Issue Tracking: https://github.com/baidu/palo/issues
> > * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> > * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> >
> > ##Initial Source
> >
> > Palo has been under development since 2017 by a team of engineers at
> Baidu Inc. It is currently hosted on Github.com under an Apache license at
> https://github.com/baidu/palo.
> >
> > ##External Dependencies
> >
> > Palo has the following external dependencies.
> >
> > * Google gflags (BSD)
> > * Google glog (BSD)
> > * Apache Thrift (Apache Software License v2.0)
> > * Apache Commons (Apache Software License v2.0)
> > * Boost (Boost Software License)
> > * OpenLdap (OpenLDAP Software License)
> > * rapidjson (Tencent)
> > * Google RE2 (BSD-style)
> > * lz4 (BSD)
> > * snappy (BSD)
> > * cyrus-sasl (CMU License)
> > * Twitter Bootstrap (Apache Software License v2.0)
> > * d3 (BSD)
> > * LLVM (BSD-like)
> >
> > Build and test dependencies:
> >
> > * ant (Apache Software License v2.0)
> > * Apache Maven (Apache Software License v2.0)
> > * cmake (BSD)
> > * clang (BSD)
> > * Google gtest (Apache Software License v2.0)
> >
> > ##Required Resources
> >
> > ###Mailing List
> >
> > There are currently no mailing lists. The usual mailing lists are
> expected to be set up when entering incubation:
> >
> > private@palo.incubator.apache.org<mailto:private@palo.
> incubator.apache.org>
> > dev@palo.incubator.apache.org<mailto:dev@palo.incubator.apache.org>
> > commits@palo.incubator.apache.org<mailto:commits@palo.
> incubator.apache.org>
> >
> > ###Subversion Directory
> >
> > Upon entering incubation: https://github.com/baidu/palo.
> > After incubation, we want to move the existing repo from
> https://github.com/baidu/palo to Apache infrastructure.
> >
> > ###Issue Tracking
> >
> > Palo currently uses GitHub to track issues. Would like to continue to do
> so while we discuss migration possibilities with the ASF Infra committee.
> >
> > ###Other Resources
> >
> > The existing code already has unit tests so we will make use of existing
> Apache continuous testing infrastructure. The resulting load should not be
> very large.
> >
> > ##Initial Committers
> >
> > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
> ue@baidu.com>)
> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
> aa.zhaoc@gmail.com>)
> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:ma
> iltolide@sina.com%EF%BC%89>
> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
> <mailto:chenhao16@baidu.com>)
> > * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
> lichaoyong@baidu.com>)
> > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
> gbinlb@gmail.com>)
> >
> > ##Affiliations
> >
> > The initial committers are employees of Baidu Inc.. The nominated
> mentors are employees of TODO.
> >
> > ##Sponsors
> >
> > ###Champion
> >
> > TODO
> >
> > ###Nominated Mentors
> >
> > * sijie guo, guosijie@gmail.com<mailto:guosijie@gmail.com>
> > * Luke Han, lukehan@apache.org<mailto:lukehan@apache.org>
> > * Zheng Shao, zshao@apache.org<mailto:zshao@apache.org>
>
> Mentors must be members of the IPMC and almost always Members of the ASF.
>
> At this moment only Luke Han is qualified.
>
> Regards,
> Dave
>
> >
> > ###Sponsoring Entity
> >
> > We are requesting the Incubator to sponsor this project.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message