incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li,De(BDG)" <l...@baidu.com>
Subject Re: Looking for Champion
Date Tue, 19 Jun 2018 10:54:46 GMT
Hi Dave,

Thank you for your summary.

For #1, I got it, we will find a new name ASAP within several days.

For #2, I see, about license, we are rechecking all licenses of components in Palo, and we
have fixed most of those we found as I wrote in last email. Next, we will continue to do this
work carefully.

For #3, We have reflected upon Jim's suggestion, and we will try to find out or define a cleanly
interface between Palo and Impala and to determine which parts should keep in Palo and which
parts should as patches for Impala. More detail and roadmap are still to be work out.

For #4, I accepted your suggestion and I will update proposal.

Once I have a new name, I will send you with updated proposal.

Best Regards,
Reed

发件人: Dave Fisher <dave2wave@comcast.net<mailto:dave2wave@comcast.net>>
答复: <general@incubator.apache.org<mailto:general@incubator.apache.org>>
日期: 2018年6月19日 星期二 上午2:08
至: <general@incubator.apache.org<mailto:general@incubator.apache.org>>
主题: Re: Looking for Champion

Hi Li,De -

Since I agreed to champion this project I think that we need a summary about what the Incubator
PMC cares about in order to accept a podling. What the prospective project needs to address.
We also need to be clear what should happen during Incubation and at what time. I think that
many of the questions that came up in this thread had to do with assessing how much effort
it will take to Incubate Palo (or whatever the name will be)

(1) The name Palo. Since there seems to be an issue with that name we should have a new name.
It is not unknown for a podling to change its name, but that does generate extra work for
Infrastructure to change the name after podling start up. It would be our preference for Palo
to find a new name prior to VOTING on the proposal. Please do this elsewhere and come back
to me with the new name so that I can help with the updated proposal.

(2) Licensing of the software. Several bits came up as questionable. Regardless of cleanup
that has already occurred we have identified that we will need to be very careful. It will
be important to discuss and carefully handle the Software Grant Agreement to make sure that
the source listed is correct. I think that the SGA must come early during incubation.

(3) Relationship with Impala. Palo has apparently forked portions of Impala. This means that
some are concerned that there is a missed synergy with the Apache Impala project. Is there
a clean interface that can be built between the projects? It would help if the Palo developers
would explore this with Impala at dev@impala.apache.org<mailto:dev@impala.apache.org>.

That said, part of the Incubation process is to learn the Apache Way. IMHO it is ok for the
relationship between Impala PMC and a pooling PPMC to be a work in process.

(4) Currently, Willem, Luke Han and Dave Fisher are qualified to officially mentor. I suggest
that Sijie Guo and Zheng Shao be included as Initial Committers in order to help from within
the PPMC.

On Jun 14, 2018, at 11:03 AM, Jim Apple <jbapple@cloudera.com.INVALID<mailto:jbapple@cloudera.com.INVALID>>
wrote:

I don't want to be a stickler, but I don't think "For issues mentioned by
Jim, Todd and Tim, I have replied on last Saturday."

To my email about Palo being an ASF project as a storage system without a
query engine, you replied only, "We will seriously consider this proposal."

I see no response to Tim's concern that "The code isn't owned by any
individual, I contributed it to Apache and it's
free for anyone to do what they want to do with it, but pulling in
improvements from other projects without any attempt to attribute it or
contribute improvements back seems contrary to the Apache way.”

Jim - do you need answers to these concerns prior to agreeing to accept this project into
the Incubator?

Regards,
Dave


On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <lide@baidu.com<mailto:lide@baidu.com>>
wrote:

Hi all,

About Palo, we have fixed following issues.

1. Related Impala
For issues mentioned by Jim, Todd and Tim, I have replied on last Saturday.

2、Lisence issue
For issues mentioned by Todd and Ted.
1) be/aes/* come from mysql-5.6, GPL v2.1 license
Fixed: removed aes related codes.
https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
180b30bf
b7
https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
77698f1c
ed

2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
Fixed: removed mysql_dtoa related codes.
https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
75b1f841
a1

3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
Fixed: restored to original lisence, we are searching another http server
to replace it.
https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
f59f04a8
31

4) be/rpc/*
Fixed: We have replaced it with brpc, and we will remove Hypertable after
few weeks for waiting users' upgrade to brpc.
https://github.com/baidu/palo/tree/master/be/src/rpc

3、Dependency licenses
For issue mentioned by Dave, It looks like that Palo have not depend on
OpenLdap and cyrus-sasl directly,
but some thirdpary libraries need them to compile, libcurl and gperftools
for instance.
For rapidjson, we are looking for alternative one.

4、About the name of Palo
For issue mentioned by Julian.
We are figuring out a better one.

Best Regards,
Reed



在 2018/6/13 上午8:54, "Li,De(BDG)" <lide@baidu.com<mailto:lide@baidu.com>>
写入:

Hi Julian,

Thank you.

It looks like that we have to find another one.
If anyone has a good name, please feel free to let me know.

Best Regards,
Reed

在 2018/6/13 上午4:20, "Julian Hyde" <jhyde@apache.org<mailto:jhyde@apache.org>>
写入:

Note that there is an existing database product called Palo - an open
source OLAP engine by German company Jedox[1]. There there is a high
likelihood that Palo would have to change its name during incubation, if
accepted.

Julian

[1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
<https://en.wikipedia.org/wiki/Palo_(OLAP_database)>



On Jun 10, 2018, at 3:49 AM, Han Luke <luke.hq@gmail.com<mailto:luke.hq@gmail.com>>
wrote:

Cool Dave, it’s great to have you to be the campaign.


________________________________
From: Tan,Zhongyi <tanzhongyi@baidu.com<mailto:tanzhongyi@baidu.com> <mailto:tanzhongyi@baidu.com>>
Sent: Saturday, June 9, 2018 8:16:28 AM
To: general@incubator.apache.org<mailto:general@incubator.apache.org> <mailto:general@incubator.apache.org>
Subject: Re: Looking for Champion

thanks,willem

we are very appreciate.

在 2018年6月8日,23:03,Willem Jiang <willem.jiang@gmail.com<mailto:willem.jiang@gmail.com>>
写道:

Hi,

I'm willing to be the Mentor.
Please count me in.



Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <dave2wave@comcast.net<mailto:dave2wave@comcast.net>>
wrote:

Hi -

I’m willing to Champion and Mentor. I have a couple of comments
inline.
I’ll look at dependency licenses later today. It’s early for me.


On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <lide@baidu.com<mailto:lide@baidu.com>>
wrote:

Hi all,

I am Reed, as a developer worked with the team for Palo (a MPP-based
interactive SQL data warehousing).
https://github.com/baidu/palo/wiki/Palo-Overview

We propose to contribute Palo as an Apache Incubator project, and
we are still looking for possible Champion if anyone would like to
volunteer. Thanks a lot.

Best Regards,
Reed

===================
The draft of the proposal as below:

#Apache Palo

##Abstract

Palo is a MPP-based interactive SQL data warehousing for reporting
and
analysis.

##Proposal

We propose to contribute the Palo codebase and associated artifacts
(e.g. documentation, web-site content etc.) to the Apache Software
Foundation with the intent of forming a productive, meritocratic and
open
community around Palo’s continued development, according to the
‘Apache
Way’.

Baidu owns several trademarks regarding Palo, and proposes to
transfer
ownership of those trademarks in full to the ASF.

###Overview of Palo

Palo’s implementation consists of two daemons: Frontend (FE) and
Backend
(BE).

**Frontend daemon** consists of query coordinator and catalog
manager.
Query coordinator is responsible for receiving users’ sql queries,
compiling queries and managing queries execution. Catalog manager is
responsible for managing metadata such as databases, tables,
partitions,
replicas and etc. Several frontend daemons could be deployed to
guarantee
fault-tolerance, and load balancing.

**Backend daemon** stores the data and executes the query fragments.
Many backend daemons could also be deployed to provide scalability
and
fault-tolerance.

A typical Palo cluster generally composes of several frontend
daemons
and dozens to hundreds of backend daemons.

Users can use MySQL client tools to connect any frontend daemon to
submit SQL query. Frontend receives the query and compiles it into
query
plans executable by the Backend. Then Frontend sends the query plan
fragments to Backend. Backend will build a query execution DAG. Data
is
fetched and pipelined into the DAG. The final result response is sent
to
client via Frontend. The distribution of query fragment execution
takes
minimizing data movement and maximizing scan locality as the main
goal.

##Background

At Baidu, Prior to Palo, different tools were deployed to solve
diverse
requirements in many ways. And when a use case requires the
simultaneous
availability of capabilities that cannot all be provided by a single
tool,
users were forced to build hybrid architectures that stitch multiple
tools
together, but we believe that they shouldn’t need to accept such
inherent
complexity. A storage system built to provide great performance
across a
broad range of workloads provides a more elegant solution to the
problems
that hybrid architectures aim to solve. Palo is the solution.

Palo is designed to be a simple and single tightly coupled system,
not
depending on other systems. Palo provides high concurrent low latency
point
query performance, but also provides high throughput queries of
ad-hoc
analysis. Palo provides bulk-batch data loading, but also provides
near
real-time mini-batch data loading. Palo also provides high
availability,
reliability, fault tolerance, and scalability.

##Rationale

Palo mainly integrates the technology of Google Mesa and Apache
Impala.

Mesa is a highly scalable analytic data storage system that stores
critical measurement data related to Google's Internet advertising
business. Mesa is designed to satisfy complex and challenging set of
users’
and systems’ requirements, including near real-time data ingestion
and
query ability, as well as high availability, reliability, fault
tolerance,
and scalability for large data and query volumes.

Impala is a modern, open-source MPP SQL engine architected from the
ground up for the Hadoop data processing environment. At present, by
virtue
of its superior performance and rich functionality, Impala has been
comparable to many commercial MPP database query engine. Mesa can
satisfy
the needs of many of our storage requirements, however Mesa itself
does not
provide a SQL query engine; Impala is a very good MPP SQL query
engine, but
the lack of a perfect distributed storage engine. So in the end we
chose
the combination of these two technologies.

Learning from Mesa’s data model, we developed a distributed storage
engine. Unlike Mesa, this storage engine does not rely on any
distributed
file system. Then we deeply integrate this storage engine with Impala
query
engine. Query compiling, query execution coordination and catalog
management of storage engine are integrated to be frontend daemon;
query
execution and data storage are integrated to be backend daemon. With
this
integration, we implemented a single, full-featured, high performance
state
the art of MPP database, as well as maintaining the simplicity.

##Current Status

Palo has been an open source project on GitHub (
https://github.com/baidu/palo).

###Meritocracy

Palo has been deployed in production at Baidu and is applying more
than
200 lines of business. It has demonstrated great performance benefits
and
has proved to be a better way for reporting and analysis based big
data.
Still We look forward to growing a rich user and developer community.

###Community

Palo seeks to develop developer and user communities during
incubation.

###Core Developers

* Ruyue Ma (https://github.com/maruyue,
maruyue@baidu.com<mailto:maruyue@baidu.com><mailto:maruy
ue@baidu.com<mailto:ue@baidu.com>>)
* Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:buaa.zhaoc@gmail.com><mailto:
bu
aa.zhaoc@gmail.com<mailto:aa.zhaoc@gmail.com>>)
* Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
* De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:mailtolide@sina.com)><mailto:
ma
iltolide@sina.com%EF%BC%89<mailto:iltolide@sina.com%EF%BC%89>>
* Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<mailto:chenhao16@baidu.com>
<mailto:chenhao16@baidu.com>)
* Chaoyong Li (https://github.com/cyongli,
lichaoyong@baidu.com<mailto:lichaoyong@baidu.com><mailto:
lichaoyong@baidu.com<mailto:lichaoyong@baidu.com>>)
* Bin Lin (https://github.com/lingbin,
lingbinlb@gmail.com<mailto:lingbinlb@gmail.com><mailto:lin
gbinlb@gmail.com<mailto:gbinlb@gmail.com>>)

###Alignment

Palo is related to several other Apache projects:

* Palo can also read data stored in Apache Hadoop clusters powered
by
the HDFS filesystem.
* Palo is closely integrated with Impala, which is also being
proposed
to the Incubator.

Apache Impala has completed Incubation. Jim Apple is VP, Impala.

* Palo uses Apache Thrift as its RPC and serialization framework of
choice.

##Known Risks

###Orphaned Products

The core developers of Palo team plan to work full time on this
project.
There is very little risk of Palo getting orphaned since at least one
large
company (Baidu) is extensively using it in their production. For
example,
currently there are more than 200 use cases using Palo in production.
Furthermore, since Palo was open sourced at the beginning of October
2017,
it has received more than 660 stars and been forked nearly 170 times.
We
plan to extend and diversify this community further through Apache.

###Inexperience with Open Source

The core developers are all active users and followers of open
source.
They are already committers and contributors to the Palo Github
project.
All have been involved with the source code that has been released
under an
open source license, and several of them also have experience
developing
code in an open source environment. Though the core set of Developers
do
not have Apache Open Source experience, there are plans to onboard
individuals with Apache open source experience on to the project.

###Homogenous Developers

The most of core developers are from Baidu, but after Palo was open
sourced, Palo received a lot of bug fixes and enhancements from other
developers not working at Baidu.

###Reliance on Salaried Developers

Baidu invested in Palo as the OLAP solution and some of its key
engineers are working full time on the project. In addition, since
there is
a growing Big Data need for scalable OLAP solutions, we look forward
to
other Apache developers and researchers to contribute to the project.
Also
key to addressing the risk associated with relying on Salaried
developers
from a single entity is to increase the diversity of the contributors
and
actively lobby for Domain experts in the BI space to contribute.
Apache
Palo intends to do this.

###An Excessive Fascination with the Apache Brand

Palo is proposing to enter incubation at Apache in order to help
efforts
to diversify the committer-base, not so much to capitalize on the
Apache
brand. The Palo project is in production use already inside Baidu,
but is
not expected to be an Baidu product for external customers. As such,
the
Palo project is not seeking to use the Apache brand as a marketing
tool.

##Documentation

Information about Palo can be found at
https://github.com/baidu/palo.
The following links provide more information about Palo in open
source:

* Palo wiki site: https://github.com/baidu/palo/wiki
* Codebase at Github: https://github.com/baidu/palo
* Issue Tracking: https://github.com/baidu/palo/issues
* Overview: https://github.com/baidu/palo/wiki/Palo-Overview
* FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ

##Initial Source

Palo has been under development since 2017 by a team of engineers at
Baidu Inc. It is currently hosted on Github.com under an Apache
license at
https://github.com/baidu/palo.

##External Dependencies

Palo has the following external dependencies.

* Google gflags (BSD)
* Google glog (BSD)
* Apache Thrift (Apache Software License v2.0)
* Apache Commons (Apache Software License v2.0)
* Boost (Boost Software License)
* OpenLdap (OpenLDAP Software License)
* rapidjson (Tencent)
* Google RE2 (BSD-style)
* lz4 (BSD)
* snappy (BSD)
* cyrus-sasl (CMU License)
* Twitter Bootstrap (Apache Software License v2.0)
* d3 (BSD)
* LLVM (BSD-like)

Build and test dependencies:

* ant (Apache Software License v2.0)
* Apache Maven (Apache Software License v2.0)
* cmake (BSD)
* clang (BSD)
* Google gtest (Apache Software License v2.0)

##Required Resources

###Mailing List

There are currently no mailing lists. The usual mailing lists are
expected to be set up when entering incubation:

private@palo.incubator.apache.org<mailto:private@palo.incubator.apache.org><mailto:private@palo.
incubator.apache.org>
dev@palo.incubator.apache.org<mailto:dev@palo.incubator.apache.org><mailto:dev@palo.incubator.apache.org>
commits@palo.incubator.apache.org<mailto:commits@palo.incubator.apache.org><mailto:commits@palo.
incubator.apache.org>

###Subversion Directory

Upon entering incubation: https://github.com/baidu/palo.
After incubation, we want to move the existing repo from
https://github.com/baidu/palo to Apache infrastructure.

###Issue Tracking

Palo currently uses GitHub to track issues. Would like to continue
to do
so while we discuss migration possibilities with the ASF Infra
committee.

###Other Resources

The existing code already has unit tests so we will make use of
existing
Apache continuous testing infrastructure. The resulting load should
not be
very large.

##Initial Committers

* Ruyue Ma (https://github.com/maruyue,
maruyue@baidu.com<mailto:maruyue@baidu.com><mailto:maruy
ue@baidu.com<mailto:ue@baidu.com>>)
* Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:buaa.zhaoc@gmail.com><mailto:
bu
aa.zhaoc@gmail.com<mailto:aa.zhaoc@gmail.com>>)
* Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
* De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:mailtolide@sina.com)><mailto:
ma
iltolide@sina.com%EF%BC%89<mailto:iltolide@sina.com%EF%BC%89>>
* Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<mailto:chenhao16@baidu.com>
<mailto:chenhao16@baidu.com>)
* Chaoyong Li (https://github.com/cyongli,
lichaoyong@baidu.com<mailto:lichaoyong@baidu.com><mailto:
lichaoyong@baidu.com<mailto:lichaoyong@baidu.com>>)
* Bin Lin (https://github.com/lingbin,
lingbinlb@gmail.com<mailto:lingbinlb@gmail.com><mailto:lin
gbinlb@gmail.com<mailto:gbinlb@gmail.com>>)

##Affiliations

The initial committers are employees of Baidu Inc.. The nominated
mentors are employees of TODO.

##Sponsors

###Champion

TODO

###Nominated Mentors

* sijie guo, guosijie@gmail.com<mailto:guosijie@gmail.com><mailto:guosijie@gmail.com>
* Luke Han, lukehan@apache.org<mailto:lukehan@apache.org><mailto:lukehan@apache.org>
* Zheng Shao, zshao@apache.org<mailto:zshao@apache.org><mailto:zshao@apache.org>

Mentors must be members of the IPMC and almost always Members of the
ASF.

At this moment only Luke Han is qualified.

Regards,
Dave


###Sponsoring Entity

We are requesting the Incubator to sponsor this project.


?B婯
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKCB??[
溳
X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
圹[X[???K[XZ[??賉橽榌
Z?[???[樰X榏?軏榎?X?K涇櫭B


?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKCB�
?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
ܙ�B��܈?Y??]?[ۘ[?
?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message