incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: Looking for Champion
Date Mon, 18 Jun 2018 19:07:48 GMT
I agree with Jim, at least mostly.

I don't mind code and toil duplication between projects in itself, but I
think that the current state of the project shows that there are two large
risks to the potential Palo podling (for lack of a better name):

1. The choice not to work with the Impala community initially shows a risk
of not working with others when it may be more difficult to do so than not.
I think this should be directly addressed in the proposal: how do we know
that this will be an open and inclusive community willing to work with
others with slightly different goals?
2. The license problems so far show that the project has not paid adequate
attention to licensing up to now, which is a big risk. I'd like to see what
kind of licensing scrub is proposed before the potential podling's first
release. I don't think that catching all the obvious ones is sufficient.

rb

On Mon, Jun 18, 2018 at 11:51 AM, Jim Apple <jbapple@cloudera.com.invalid>
wrote:

> I'm not a binding vote on incubator entry, but I think it would be
> great to have roadmaps as soon as feasible on addressing Tim's concern
> (which is deeply related to #2, "Licensing") and on addressing the
> code and toil duplication.
>
> On Mon, Jun 18, 2018 at 11:08 AM, Dave Fisher <dave2wave@comcast.net>
> wrote:
> > Hi Li,De -
> >
> > Since I agreed to champion this project I think that we need a summary
> about
> > what the Incubator PMC cares about in order to accept a podling. What the
> > prospective project needs to address. We also need to be clear what
> should
> > happen during Incubation and at what time. I think that many of the
> > questions that came up in this thread had to do with assessing how much
> > effort it will take to Incubate Palo (or whatever the name will be)
> >
> > (1) The name Palo. Since there seems to be an issue with that name we
> should
> > have a new name. It is not unknown for a podling to change its name, but
> > that does generate extra work for Infrastructure to change the name after
> > podling start up. It would be our preference for Palo to find a new name
> > prior to VOTING on the proposal. Please do this elsewhere and come back
> to
> > me with the new name so that I can help with the updated proposal.
> >
> > (2) Licensing of the software. Several bits came up as questionable.
> > Regardless of cleanup that has already occurred we have identified that
> we
> > will need to be very careful. It will be important to discuss and
> carefully
> > handle the Software Grant Agreement to make sure that the source listed
> is
> > correct. I think that the SGA must come early during incubation.
> >
> > (3) Relationship with Impala. Palo has apparently forked portions of
> Impala.
> > This means that some are concerned that there is a missed synergy with
> the
> > Apache Impala project. Is there a clean interface that can be built
> between
> > the projects? It would help if the Palo developers would explore this
> with
> > Impala at dev@impala.apache.org.
> >
> > That said, part of the Incubation process is to learn the Apache Way.
> IMHO
> > it is ok for the relationship between Impala PMC and a pooling PPMC to
> be a
> > work in process.
> >
> > (4) Currently, Willem, Luke Han and Dave Fisher are qualified to
> officially
> > mentor. I suggest that Sijie Guo and Zheng Shao be included as Initial
> > Committers in order to help from within the PPMC.
> >
> > On Jun 14, 2018, at 11:03 AM, Jim Apple <jbapple@cloudera.com.INVALID>
> > wrote:
> >
> > I don't want to be a stickler, but I don't think "For issues mentioned by
> > Jim, Todd and Tim, I have replied on last Saturday."
> >
> > To my email about Palo being an ASF project as a storage system without a
> > query engine, you replied only, "We will seriously consider this
> proposal."
> >
> > I see no response to Tim's concern that "The code isn't owned by any
> > individual, I contributed it to Apache and it's
> > free for anyone to do what they want to do with it, but pulling in
> > improvements from other projects without any attempt to attribute it or
> > contribute improvements back seems contrary to the Apache way.”
> >
> >
> > Jim - do you need answers to these concerns prior to agreeing to accept
> this
> > project into the Incubator?
> >
> > Regards,
> > Dave
> >
> >
> > On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <lide@baidu.com> wrote:
> >
> > Hi all,
> >
> > About Palo, we have fixed following issues.
> >
> > 1. Related Impala
> > For issues mentioned by Jim, Todd and Tim, I have replied on last
> Saturday.
> >
> > 2、Lisence issue
> > For issues mentioned by Todd and Ted.
> > 1) be/aes/* come from mysql-5.6, GPL v2.1 license
> > Fixed: removed aes related codes.
> > https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
> > 180b30bf
> > b7
> > https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
> > 77698f1c
> > ed
> >
> > 2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
> > Fixed: removed mysql_dtoa related codes.
> > https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
> > 75b1f841
> > a1
> >
> > 3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
> > Fixed: restored to original lisence, we are searching another http server
> > to replace it.
> > https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
> > f59f04a8
> > 31
> >
> > 4) be/rpc/*
> > Fixed: We have replaced it with brpc, and we will remove Hypertable after
> > few weeks for waiting users' upgrade to brpc.
> > https://github.com/baidu/palo/tree/master/be/src/rpc
> >
> > 3、Dependency licenses
> > For issue mentioned by Dave, It looks like that Palo have not depend on
> > OpenLdap and cyrus-sasl directly,
> > but some thirdpary libraries need them to compile, libcurl and gperftools
> > for instance.
> > For rapidjson, we are looking for alternative one.
> >
> > 4、About the name of Palo
> > For issue mentioned by Julian.
> > We are figuring out a better one.
> >
> > Best Regards,
> > Reed
> >
> >
> >
> > 在 2018/6/13 上午8:54, "Li,De(BDG)" <lide@baidu.com> 写入:
> >
> > Hi Julian,
> >
> > Thank you.
> >
> > It looks like that we have to find another one.
> > If anyone has a good name, please feel free to let me know.
> >
> > Best Regards,
> > Reed
> >
> > 在 2018/6/13 上午4:20, "Julian Hyde" <jhyde@apache.org> 写入:
> >
> > Note that there is an existing database product called Palo - an open
> > source OLAP engine by German company Jedox[1]. There there is a high
> > likelihood that Palo would have to change its name during incubation, if
> > accepted.
> >
> > Julian
> >
> > [1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
> > <https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
> >
> >
> >
> > On Jun 10, 2018, at 3:49 AM, Han Luke <luke.hq@gmail.com> wrote:
> >
> > Cool Dave, it’s great to have you to be the campaign.
> >
> >
> > ________________________________
> > From: Tan,Zhongyi <tanzhongyi@baidu.com <mailto:tanzhongyi@baidu.com>>
> > Sent: Saturday, June 9, 2018 8:16:28 AM
> > To: general@incubator.apache.org <mailto:general@incubator.apache.org>
> > Subject: Re: Looking for Champion
> >
> > thanks,willem
> >
> > we are very appreciate.
> >
> > 在 2018年6月8日,23:03,Willem Jiang <willem.jiang@gmail.com> 写道:
> >
> > Hi,
> >
> > I'm willing to be the Mentor.
> > Please count me in.
> >
> >
> >
> > Willem Jiang
> >
> > Twitter: willemjiang
> > Weibo: 姜宁willem
> >
> > On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <dave2wave@comcast.net>
> > wrote:
> >
> > Hi -
> >
> > I’m willing to Champion and Mentor. I have a couple of comments
> > inline.
> > I’ll look at dependency licenses later today. It’s early for me.
> >
> >
> > On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <lide@baidu.com> wrote:
> >
> > Hi all,
> >
> > I am Reed, as a developer worked with the team for Palo (a MPP-based
> >
> > interactive SQL data warehousing).
> >
> > https://github.com/baidu/palo/wiki/Palo-Overview
> >
> > We propose to contribute Palo as an Apache Incubator project, and
> > we are still looking for possible Champion if anyone would like to
> >
> > volunteer. Thanks a lot.
> >
> >
> > Best Regards,
> > Reed
> >
> > ===================
> > The draft of the proposal as below:
> >
> > #Apache Palo
> >
> > ##Abstract
> >
> > Palo is a MPP-based interactive SQL data warehousing for reporting
> > and
> >
> > analysis.
> >
> >
> > ##Proposal
> >
> > We propose to contribute the Palo codebase and associated artifacts
> >
> > (e.g. documentation, web-site content etc.) to the Apache Software
> > Foundation with the intent of forming a productive, meritocratic and
> > open
> > community around Palo’s continued development, according to the
> > ‘Apache
> > Way’.
> >
> >
> > Baidu owns several trademarks regarding Palo, and proposes to
> > transfer
> >
> > ownership of those trademarks in full to the ASF.
> >
> >
> > ###Overview of Palo
> >
> > Palo’s implementation consists of two daemons: Frontend (FE) and
> > Backend
> >
> > (BE).
> >
> >
> > **Frontend daemon** consists of query coordinator and catalog
> > manager.
> >
> > Query coordinator is responsible for receiving users’ sql queries,
> > compiling queries and managing queries execution. Catalog manager is
> > responsible for managing metadata such as databases, tables,
> > partitions,
> > replicas and etc. Several frontend daemons could be deployed to
> > guarantee
> > fault-tolerance, and load balancing.
> >
> >
> > **Backend daemon** stores the data and executes the query fragments.
> >
> > Many backend daemons could also be deployed to provide scalability
> > and
> > fault-tolerance.
> >
> >
> > A typical Palo cluster generally composes of several frontend
> > daemons
> >
> > and dozens to hundreds of backend daemons.
> >
> >
> > Users can use MySQL client tools to connect any frontend daemon to
> >
> > submit SQL query. Frontend receives the query and compiles it into
> > query
> > plans executable by the Backend. Then Frontend sends the query plan
> > fragments to Backend. Backend will build a query execution DAG. Data
> > is
> > fetched and pipelined into the DAG. The final result response is sent
> > to
> > client via Frontend. The distribution of query fragment execution
> > takes
> > minimizing data movement and maximizing scan locality as the main
> > goal.
> >
> >
> > ##Background
> >
> > At Baidu, Prior to Palo, different tools were deployed to solve
> > diverse
> >
> > requirements in many ways. And when a use case requires the
> > simultaneous
> > availability of capabilities that cannot all be provided by a single
> > tool,
> > users were forced to build hybrid architectures that stitch multiple
> > tools
> > together, but we believe that they shouldn’t need to accept such
> > inherent
> > complexity. A storage system built to provide great performance
> > across a
> > broad range of workloads provides a more elegant solution to the
> > problems
> > that hybrid architectures aim to solve. Palo is the solution.
> >
> >
> > Palo is designed to be a simple and single tightly coupled system,
> > not
> >
> > depending on other systems. Palo provides high concurrent low latency
> > point
> > query performance, but also provides high throughput queries of
> > ad-hoc
> > analysis. Palo provides bulk-batch data loading, but also provides
> > near
> > real-time mini-batch data loading. Palo also provides high
> > availability,
> > reliability, fault tolerance, and scalability.
> >
> >
> > ##Rationale
> >
> > Palo mainly integrates the technology of Google Mesa and Apache
> > Impala.
> >
> > Mesa is a highly scalable analytic data storage system that stores
> >
> > critical measurement data related to Google's Internet advertising
> > business. Mesa is designed to satisfy complex and challenging set of
> > users’
> > and systems’ requirements, including near real-time data ingestion
> > and
> > query ability, as well as high availability, reliability, fault
> > tolerance,
> > and scalability for large data and query volumes.
> >
> >
> > Impala is a modern, open-source MPP SQL engine architected from the
> >
> > ground up for the Hadoop data processing environment. At present, by
> > virtue
> > of its superior performance and rich functionality, Impala has been
> > comparable to many commercial MPP database query engine. Mesa can
> > satisfy
> > the needs of many of our storage requirements, however Mesa itself
> > does not
> > provide a SQL query engine; Impala is a very good MPP SQL query
> > engine, but
> > the lack of a perfect distributed storage engine. So in the end we
> > chose
> > the combination of these two technologies.
> >
> >
> > Learning from Mesa’s data model, we developed a distributed storage
> >
> > engine. Unlike Mesa, this storage engine does not rely on any
> > distributed
> > file system. Then we deeply integrate this storage engine with Impala
> > query
> > engine. Query compiling, query execution coordination and catalog
> > management of storage engine are integrated to be frontend daemon;
> > query
> > execution and data storage are integrated to be backend daemon. With
> > this
> > integration, we implemented a single, full-featured, high performance
> > state
> > the art of MPP database, as well as maintaining the simplicity.
> >
> >
> > ##Current Status
> >
> > Palo has been an open source project on GitHub (
> >
> > https://github.com/baidu/palo).
> >
> >
> > ###Meritocracy
> >
> > Palo has been deployed in production at Baidu and is applying more
> > than
> >
> > 200 lines of business. It has demonstrated great performance benefits
> > and
> > has proved to be a better way for reporting and analysis based big
> > data.
> > Still We look forward to growing a rich user and developer community.
> >
> >
> > ###Community
> >
> > Palo seeks to develop developer and user communities during
> > incubation.
> >
> > ###Core Developers
> >
> > * Ruyue Ma (https://github.com/maruyue,
> > maruyue@baidu.com<mailto:maruy
> >
> > ue@baidu.com>)
> >
> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
> >
> > bu
> >
> > aa.zhaoc@gmail.com>)
> >
> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:
> >
> > ma
> >
> > iltolide@sina.com%EF%BC%89>
> >
> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
> >
> > <mailto:chenhao16@baidu.com>)
> >
> > * Chaoyong Li (https://github.com/cyongli,
> > lichaoyong@baidu.com<mailto:
> >
> > lichaoyong@baidu.com>)
> >
> > * Bin Lin (https://github.com/lingbin,
> > lingbinlb@gmail.com<mailto:lin
> >
> > gbinlb@gmail.com>)
> >
> >
> > ###Alignment
> >
> > Palo is related to several other Apache projects:
> >
> > * Palo can also read data stored in Apache Hadoop clusters powered
> > by
> >
> > the HDFS filesystem.
> >
> > * Palo is closely integrated with Impala, which is also being
> > proposed
> >
> > to the Incubator.
> >
> > Apache Impala has completed Incubation. Jim Apple is VP, Impala.
> >
> > * Palo uses Apache Thrift as its RPC and serialization framework of
> >
> > choice.
> >
> >
> > ##Known Risks
> >
> > ###Orphaned Products
> >
> > The core developers of Palo team plan to work full time on this
> > project.
> >
> > There is very little risk of Palo getting orphaned since at least one
> > large
> > company (Baidu) is extensively using it in their production. For
> > example,
> > currently there are more than 200 use cases using Palo in production.
> > Furthermore, since Palo was open sourced at the beginning of October
> > 2017,
> > it has received more than 660 stars and been forked nearly 170 times.
> > We
> > plan to extend and diversify this community further through Apache.
> >
> >
> > ###Inexperience with Open Source
> >
> > The core developers are all active users and followers of open
> > source.
> >
> > They are already committers and contributors to the Palo Github
> > project.
> > All have been involved with the source code that has been released
> > under an
> > open source license, and several of them also have experience
> > developing
> > code in an open source environment. Though the core set of Developers
> > do
> > not have Apache Open Source experience, there are plans to onboard
> > individuals with Apache open source experience on to the project.
> >
> >
> > ###Homogenous Developers
> >
> > The most of core developers are from Baidu, but after Palo was open
> >
> > sourced, Palo received a lot of bug fixes and enhancements from other
> > developers not working at Baidu.
> >
> >
> > ###Reliance on Salaried Developers
> >
> > Baidu invested in Palo as the OLAP solution and some of its key
> >
> > engineers are working full time on the project. In addition, since
> > there is
> > a growing Big Data need for scalable OLAP solutions, we look forward
> > to
> > other Apache developers and researchers to contribute to the project.
> > Also
> > key to addressing the risk associated with relying on Salaried
> > developers
> > from a single entity is to increase the diversity of the contributors
> > and
> > actively lobby for Domain experts in the BI space to contribute.
> > Apache
> > Palo intends to do this.
> >
> >
> > ###An Excessive Fascination with the Apache Brand
> >
> > Palo is proposing to enter incubation at Apache in order to help
> > efforts
> >
> > to diversify the committer-base, not so much to capitalize on the
> > Apache
> > brand. The Palo project is in production use already inside Baidu,
> > but is
> > not expected to be an Baidu product for external customers. As such,
> > the
> > Palo project is not seeking to use the Apache brand as a marketing
> > tool.
> >
> >
> > ##Documentation
> >
> > Information about Palo can be found at
> > https://github.com/baidu/palo.
> >
> > The following links provide more information about Palo in open
> > source:
> >
> >
> > * Palo wiki site: https://github.com/baidu/palo/wiki
> > * Codebase at Github: https://github.com/baidu/palo
> > * Issue Tracking: https://github.com/baidu/palo/issues
> > * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> > * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> >
> > ##Initial Source
> >
> > Palo has been under development since 2017 by a team of engineers at
> >
> > Baidu Inc. It is currently hosted on Github.com under an Apache
> > license at
> > https://github.com/baidu/palo.
> >
> >
> > ##External Dependencies
> >
> > Palo has the following external dependencies.
> >
> > * Google gflags (BSD)
> > * Google glog (BSD)
> > * Apache Thrift (Apache Software License v2.0)
> > * Apache Commons (Apache Software License v2.0)
> > * Boost (Boost Software License)
> > * OpenLdap (OpenLDAP Software License)
> > * rapidjson (Tencent)
> > * Google RE2 (BSD-style)
> > * lz4 (BSD)
> > * snappy (BSD)
> > * cyrus-sasl (CMU License)
> > * Twitter Bootstrap (Apache Software License v2.0)
> > * d3 (BSD)
> > * LLVM (BSD-like)
> >
> > Build and test dependencies:
> >
> > * ant (Apache Software License v2.0)
> > * Apache Maven (Apache Software License v2.0)
> > * cmake (BSD)
> > * clang (BSD)
> > * Google gtest (Apache Software License v2.0)
> >
> > ##Required Resources
> >
> > ###Mailing List
> >
> > There are currently no mailing lists. The usual mailing lists are
> >
> > expected to be set up when entering incubation:
> >
> >
> > private@palo.incubator.apache.org<mailto:private@palo.
> >
> > incubator.apache.org>
> >
> > dev@palo.incubator.apache.org<mailto:dev@palo.incubator.apache.org>
> > commits@palo.incubator.apache.org<mailto:commits@palo.
> >
> > incubator.apache.org>
> >
> >
> > ###Subversion Directory
> >
> > Upon entering incubation: https://github.com/baidu/palo.
> > After incubation, we want to move the existing repo from
> >
> > https://github.com/baidu/palo to Apache infrastructure.
> >
> >
> > ###Issue Tracking
> >
> > Palo currently uses GitHub to track issues. Would like to continue
> > to do
> >
> > so while we discuss migration possibilities with the ASF Infra
> > committee.
> >
> >
> > ###Other Resources
> >
> > The existing code already has unit tests so we will make use of
> > existing
> >
> > Apache continuous testing infrastructure. The resulting load should
> > not be
> > very large.
> >
> >
> > ##Initial Committers
> >
> > * Ruyue Ma (https://github.com/maruyue,
> > maruyue@baidu.com<mailto:maruy
> >
> > ue@baidu.com>)
> >
> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
> >
> > bu
> >
> > aa.zhaoc@gmail.com>)
> >
> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > * De Li(https://github.com/lide-reed, mailtolide@sina.com)<mailto:
> >
> > ma
> >
> > iltolide@sina.com%EF%BC%89>
> >
> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
> >
> > <mailto:chenhao16@baidu.com>)
> >
> > * Chaoyong Li (https://github.com/cyongli,
> > lichaoyong@baidu.com<mailto:
> >
> > lichaoyong@baidu.com>)
> >
> > * Bin Lin (https://github.com/lingbin,
> > lingbinlb@gmail.com<mailto:lin
> >
> > gbinlb@gmail.com>)
> >
> >
> > ##Affiliations
> >
> > The initial committers are employees of Baidu Inc.. The nominated
> >
> > mentors are employees of TODO.
> >
> >
> > ##Sponsors
> >
> > ###Champion
> >
> > TODO
> >
> > ###Nominated Mentors
> >
> > * sijie guo, guosijie@gmail.com<mailto:guosijie@gmail.com>
> > * Luke Han, lukehan@apache.org<mailto:lukehan@apache.org>
> > * Zheng Shao, zshao@apache.org<mailto:zshao@apache.org>
> >
> >
> > Mentors must be members of the IPMC and almost always Members of the
> > ASF.
> >
> > At this moment only Luke Han is qualified.
> >
> > Regards,
> > Dave
> >
> >
> > ###Sponsoring Entity
> >
> > We are requesting the Incubator to sponsor this project.
> >
> >
> >
> > ?B婯
> > KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
> >
> > KKKKKKKCB??[
> >
> > 溳
> > X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
> >
> > 圹[X[???K[XZ[??賉橽榌
> >
> > Z?[???[樰X榏?軏榎?X?K涇櫭B
> >
> >
> >
> > ?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
> >
> > KKKKKKKKCB�
> >
> > ?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
> >
> > ܙ�B��܈?Y??]?[ۘ[?
> >
> > ?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message