From general-return-63286-archive-asf-public=cust-asf.ponee.io@incubator.apache.org Fri Feb 2 03:24:01 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 583CB180652 for ; Fri, 2 Feb 2018 03:24:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 47CE2160C56; Fri, 2 Feb 2018 02:24:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 43FD0160C44 for ; Fri, 2 Feb 2018 03:24:00 +0100 (CET) Received: (qmail 40888 invoked by uid 500); 2 Feb 2018 02:23:58 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 40877 invoked by uid 99); 2 Feb 2018 02:23:58 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Feb 2018 02:23:58 +0000 Received: from [192.168.201.152] (c-73-181-14-238.hsd1.co.comcast.net [73.181.14.238]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 74CE71A0098 for ; Fri, 2 Feb 2018 02:23:58 +0000 (UTC) From: Leif Hedstrom Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) Date: Thu, 1 Feb 2018 19:23:56 -0700 Subject: Re: [VOTE] Accept Coral into the Apache Incubator Message-Id: References: In-Reply-To: To: general@incubator.apache.org X-Mailer: iPhone Mail (15D60) +1(binding) =E2=80=94 Leif=20 > On Feb 1, 2018, at 07:07, Byung-Gon Chun wrote: >=20 > Hi all, >=20 > I would like to start a VOTE to propose the Coral project as a podling int= o > the Apache Incubator. >=20 > The ASF voting rules are described at https://www.apache.org/foundation/ > voting.html >=20 > A vote for accepting a new Apache Incubator podling is a majority vote for= > which only Incubator PMC member votes are binding. >=20 > This vote will run for at least 72 hours. Please VOTE as follows. > [] +1 Accept Coral into the Apache Incubator > [] +0 Abstain > [] -1 Do not accept Coral into the Apache Incubator because ... >=20 > The proposal is listed below, but you can also access it on the wiki: > https://wiki.apache.org/incubator/CoralProposal >=20 > =3D CoralProposal =3D >=20 > =3D=3D Abstract =3D=3D > Coral is a data processing system for flexible employment with > different execution scenarios for various deployment characteristics > on clusters. >=20 > =3D=3D Proposal =3D=3D > Today, there is a wide variety of data processing systems with > different designs for better performance and datacenter efficiency. > They include processing data on specific resource environments and > running jobs with specific attributes. Although each system > successfully solves the problems it targets, most systems are designed > in the way that runtime behaviors are built tightly inside the system > core to hide the complexity of distributed computing. This makes it > hard for a single system to support different deployment > characteristics with different runtime behaviors without substantial > effort. >=20 > Coral is a data processing system that aims to flexibly control the > runtime behaviors of a job to adapt to varying deployment > characteristics. Moreover, it provides a means of extending the > system=E2=80=99s capabilities and incorporating the extensions to the flex= ible > job execution. >=20 > In order to be able to easily modify runtime behaviors to adapt to > varying deployment characteristics, Coral exposes runtime behaviors to > be flexibly configured and modified at both compile-time and runtime > through a set of high-level graph pass interfaces. >=20 > We hope to contribute to the big data processing community by enabling > more flexibility and extensibility in job executions. Furthermore, we > can benefit more together as a community when we work together as a > community to mature the system with more use cases and understanding > of diverse deployment characteristics. The Apache Software Foundation > is the perfect place to achieve these aspirations. >=20 > =3D=3D Background =3D=3D > Many data processing systems have distinctive runtime behaviors > optimized and configured for specific deployment characteristics like > different resource environments and for handling special job > attributes. >=20 > For example, much research have been conducted to overcome the > challenge of running data processing jobs on cheap, unreliable > transient resources. Likewise, techniques for disaggregating different > types of resources, like memory, CPU and GPU, are being actively > developed to use datacenter resources more efficiently. Many > researchers are also working to run data processing jobs in even more > diverse environments, such as across distant datacenters. Similarly, > for special job attributes, many works take different approaches, such > as runtime optimization, to solve problems like data skew, and to > optimize systems for data processing jobs with small-scale input data. >=20 > Although each of the systems performs well with the jobs and in the > environments they target, they perform poorly with unconsidered cases, > and do not consider supporting multiple deployment characteristics on > a single system in their designs. >=20 > For an application writer to optimize an application to perform well > on a certain system engraved with its underlying behaviors, it > requires a deep understanding of the system itself, which is an > overhead that often requires a lot of time and effort. Moreover, for a > developer to modify such system behaviors, it requires modifications > of the system core, which requires an even deeper understanding of the > system itself. >=20 > With this background, Coral is designed to represent all of its jobs > as an Intermediate Representation (IR) DAG. In the Coral compiler, > user applications from various programming models (ex. Apache Beam) > are submitted, transformed to an IR DAG, and optimized/customized for > the deployment characteristics. In the IR DAG optimization phase, the > DAG is modified through a series of compiler =E2=80=9Cpasses=E2=80=9D whic= h reshape or > annotate the DAG with an expression of the underlying runtime > behaviors. The IR DAG is then submitted as an execution plan for the > Coral runtime. The runtime includes the unmodified parts of data > processing in the backbone which is transparently integrated with > configurable components exposed for further extension. >=20 > =3D=3D Rationale =3D=3D > Coral=E2=80=99s vision lies in providing means for flexibly supporting a w= ide > variety of job execution scenarios for users while facilitating system > developers to extend the execution framework with various > functionalities at the same time. The capabilities of the system can > be extended as it grows to meet a more variety of execution scenarios. > We require inputs from users and developers from diverse domains in > order to make it a more thriving and useful project. The Apache > Software Foundation provides the best tools and community to support > this vision. >=20 > =3D=3D Initial Goals =3D=3D > Initial goals will be to move the existing codebase to Apache and > integrate with the Apache development process. We further plan to > develop our system to meet the needs for more execution scenarios for > a more variety of deployment characteristics. >=20 > =3D=3D Current Status =3D=3D > Coral codebase is currently hosted in a repository at github.com. The > current version has been developed by system developers at Seoul > National University, Viva Republica, Samsung, and LG. >=20 > =3D=3D Meritocracy =3D=3D > We plan to strongly support meritocracy. We will discuss the > requirements in an open forum, and those that continuously contribute > to Coral with the passion to strengthen the system will be invited as > committers. Contributors that enrich Coral by providing various use > cases, various implementations of the configurable components > including ideas for optimization techniques will be especially > welcome. Committers with a deep understanding of the system=E2=80=99s > technical aspects as a whole and its philosophy will definitely be > voted as the PMC. We will monitor community participation so that > privileges can be extended to those that contribute. >=20 > =3D=3D Community =3D=3D > We hope to expand our contribution community by becoming an Apache > incubator project. The contributions will come from both users and > system developers interested in flexibility and extensibility of job > executions that Coral can support. We expect users to mainly > contribute to diversify the use cases and deployment characteristics, > and developers to contribute to implement them. >=20 > =3D=3D Alignment =3D=3D > Apache Spark is one of many popular data processing frameworks. The > system is designed towards optimizing jobs using RDDs in memory and > many other optimizations built tightly within the framework. In > contrast to Spark, Coral aims to provide more flexibility for job > execution in an easy manner. >=20 > Apache Tez enables developers to build complex task DAGs with control > over the control plane of job execution. In Coral, a high-level > programming layer (ex. Apache Beam) is automatically converted to a > basic IR DAG and can be converted to any IR DAG through a series of > easy user writable passes, that can both reshape and modify the > annotation (of execution properties) of the DAG. Moreover, Coral > leaves more parts of the job execution configurable, such as the > scheduler and the data plane. As opposed to providing a set of > properties for solid optimization, Coral=E2=80=99s configurable parts can b= e > easily extended and explored by implementing the pre-defined > interfaces. For example, an arbitrary intermediate data store can be > added. >=20 > Coral currently supports Apache Beam programs and we are working on > supporting Apache Spark programs as well. Coral also utilizes Apache > REEF for container management, which allows Coral to run in Apache > YARN and Apache Mesos clusters. If necessary, we plan to contribute to > and collaborate with these other Apache projects for the benefit of > all. We plan to extend such integrations with more Apache softwares. > Apache software foundation already hosts many major big-data systems, > and we expect to help further growth of the big-data community by > having Coral within the Apache foundation. >=20 > =3D=3D Known Risks =3D=3D > =3D=3D=3D Orphaned Products =3D=3D=3D > The risk of the Coral project being orphaned is minimal. There is > already plenty of work that arduously support different deployment > characteristics, and we propose a general way to implement them with > flexible and extensible configuration knobs. The domain of data > processing is already of high interest, and this domain is expected to > evolve continuously with various other purposes, such as resource > disaggregation and using transient resources for better datacenter > resource utilization. >=20 > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > The initial committers include PMC members and committers of other > Apache projects. They have experience with open source projects, > starting from their incubation to the top-level. They have been > involved in the open source development process, and are familiar with > releasing code under an open source license. >=20 > =3D=3D=3D Homogeneous Developers =3D=3D=3D > The initial set of committers is from a limited set of organizations, > but we expect to attract new contributors from diverse organizations > and will thus grow organically once approved for incubation. Our prior > experience with other open source projects will help various > contributors to actively participate in our project. >=20 > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > Many developers are from Seoul National University. This is not applicable= . >=20 > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > Coral positions itself among multiple Apache products. It runs on > Apache REEF for container management. It also utilizes many useful > development tools including Apache Maven, Apache Log4J, and multiple > Apache Commons components. Coral supports the Apache Beam programming > model for user applications. We are currently working on supporting > the Apache Spark programming APIs as well. >=20 > =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > We hope to make Coral a powerful system for data processing, meeting > various needs for different deployment characteristics, under a more > variety of environments. We see the limitations of simply putting code > on GitHub, and we believe the Apache community will help the growth of > Coral for the project to become a positively impactful and innovative > open source software. We believe Coral is a great fit for the Apache > Software Foundation due to the collaboration it aims to achieve from > the big data processing community. >=20 > =3D=3D Documentation =3D=3D > The current documentation for Coral is at https://snuspl.github.io/coral/.= >=20 > =3D=3D Initial Source =3D=3D > The Coral codebase is currently hosted at https://github.com/snuspl/coral.= >=20 > =3D=3D External Dependencies =3D=3D > To the best of our knowledge, all Coral dependencies are distributed > under Apache compatible licenses. Upon acceptance to the incubator, we > would begin a thorough analysis of all transitive dependencies to > verify this fact and further introduce license checking into the build > and release process. >=20 > =3D=3D Cryptography =3D=3D > Not applicable. >=20 > =3D=3D Required Resources =3D=3D > =3D=3D=3D Mailing Lists =3D=3D=3D > We will operate two mailing lists as follows: > * Coral PMC discussions: private@coral.incubator.apache.org > * Coral developers: dev@coral.incubator.apache.org >=20 > =3D=3D=3D Git Repositories =3D=3D=3D > Upon incubation: https://github.com/apache/incubator-coral. > After the incubation, we would like to move the existing repo > https://github.com/snuspl/coral to the Apache infrastructure >=20 > =3D=3D=3D Issue Tracking =3D=3D=3D > Coral currently tracks its issues using the Github issue tracker: > https://github.com/snuspl/coral/issues. We plan to migrate to Apache > JIRA. >=20 > =3D=3D Initial Committers =3D=3D > * Byung-Gon Chun > * Jeongyoon Eo > * Geon-Woo Kim > * Joo Yeon Kim > * Gyewon Lee > * Jung-Gil Lee > * Sanha Lee > * Wooyeon Lee > * Yunseong Lee > * JangHo Seo > * Won Wook Song > * Taegeon Um > * Youngseok Yang >=20 > =3D=3D Affiliations =3D=3D > * SNU (Seoul National University) > * Byung-Gon Chun > * Jeongyoon Eo > * Geon-Woo Kim > * Gyewon Lee > * Sanha Lee > * Wooyeon Lee > * Yunseong Lee > * JangHo Seo > * Won Wook Song > * Taegeon Um > * Youngseok Yang >=20 > * LG > * Jung-Gil Lee >=20 > * Samsung > * Joo Yeon Kim >=20 > * Viva Republica > * Geon-Woo Kim >=20 > =3D=3D Sponsors =3D=3D > =3D=3D=3D Champions =3D=3D=3D > Byung-Gon Chun >=20 > =3D=3D=3D Mentors =3D=3D=3D > * Hyunsik Choi > * Byung-Gon Chun > * Jean-Baptiste Onofr=C3=A9 > * Markus Weimer > * Reynold Xin >=20 > =3D=3D=3D Sponsoring Entity =3D=3D=3D > The Apache Incubator >=20 >=20 > Thanks! > Byung-Gon Chun --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org