incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <hyun...@apache.org>
Subject Re: [VOTE] Accept Coral into the Apache Incubator
Date Fri, 02 Feb 2018 00:24:08 GMT
+1 (binding)

On Thu, Feb 1, 2018 at 2:37 PM Dave Fisher <dave2wave@comcast.net> wrote:

> +1 (binding)
>
> Sent from my iPhone
>
> > On Feb 1, 2018, at 6:07 AM, Byung-Gon Chun <bgchun@gmail.com> wrote:
> >
> > Hi all,
> >
> > I would like to start a VOTE to propose the Coral project as a podling
> into
> > the Apache Incubator.
> >
> > The ASF voting rules are described at https://www.apache.org/foundation/
> > voting.html
> >
> > A vote for accepting a new Apache Incubator podling is a majority vote
> for
> > which only Incubator PMC member votes are binding.
> >
> > This vote will run for at least 72 hours. Please VOTE as follows.
> > [] +1 Accept Coral into the Apache Incubator
> > [] +0 Abstain
> > [] -1 Do not accept Coral into the Apache Incubator because ...
> >
> > The proposal is listed below, but you can also access it on the wiki:
> > https://wiki.apache.org/incubator/CoralProposal
> >
> > = CoralProposal =
> >
> > == Abstract ==
> > Coral is a data processing system for flexible employment with
> > different execution scenarios for various deployment characteristics
> > on clusters.
> >
> > == Proposal ==
> > Today, there is a wide variety of data processing systems with
> > different designs for better performance and datacenter efficiency.
> > They include processing data on specific resource environments and
> > running jobs with specific attributes. Although each system
> > successfully solves the problems it targets, most systems are designed
> > in the way that runtime behaviors are built tightly inside the system
> > core to hide the complexity of distributed computing. This makes it
> > hard for a single system to support different deployment
> > characteristics with different runtime behaviors without substantial
> > effort.
> >
> > Coral is a data processing system that aims to flexibly control the
> > runtime behaviors of a job to adapt to varying deployment
> > characteristics. Moreover, it provides a means of extending the
> > system’s capabilities and incorporating the extensions to the flexible
> > job execution.
> >
> > In order to be able to easily modify runtime behaviors to adapt to
> > varying deployment characteristics, Coral exposes runtime behaviors to
> > be flexibly configured and modified at both compile-time and runtime
> > through a set of high-level graph pass interfaces.
> >
> > We hope to contribute to the big data processing community by enabling
> > more flexibility and extensibility in job executions. Furthermore, we
> > can benefit more together as a community when we work together as a
> > community to mature the system with more use cases and understanding
> > of diverse deployment characteristics. The Apache Software Foundation
> > is the perfect place to achieve these aspirations.
> >
> > == Background ==
> > Many data processing systems have distinctive runtime behaviors
> > optimized and configured for specific deployment characteristics like
> > different resource environments and for handling special job
> > attributes.
> >
> > For example, much research have been conducted to overcome the
> > challenge of running data processing jobs on cheap, unreliable
> > transient resources. Likewise, techniques for disaggregating different
> > types of resources, like memory, CPU and GPU, are being actively
> > developed to use datacenter resources more efficiently. Many
> > researchers are also working to run data processing jobs in even more
> > diverse environments, such as across distant datacenters. Similarly,
> > for special job attributes, many works take different approaches, such
> > as runtime optimization, to solve problems like data skew, and to
> > optimize systems for data processing jobs with small-scale input data.
> >
> > Although each of the systems performs well with the jobs and in the
> > environments they target, they perform poorly with unconsidered cases,
> > and do not consider supporting multiple deployment characteristics on
> > a single system in their designs.
> >
> > For an application writer to optimize an application to perform well
> > on a certain system engraved with its underlying behaviors, it
> > requires a deep understanding of the system itself, which is an
> > overhead that often requires a lot of time and effort. Moreover, for a
> > developer to modify such system behaviors, it requires modifications
> > of the system core, which requires an even deeper understanding of the
> > system itself.
> >
> > With this background, Coral is designed to represent all of its jobs
> > as an Intermediate Representation (IR) DAG. In the Coral compiler,
> > user applications from various programming models (ex. Apache Beam)
> > are submitted, transformed to an IR DAG, and optimized/customized for
> > the deployment characteristics. In the IR DAG optimization phase, the
> > DAG is modified through a series of compiler “passes” which reshape or
> > annotate the DAG with an expression of the underlying runtime
> > behaviors. The IR DAG is then submitted as an execution plan for the
> > Coral runtime. The runtime includes the unmodified parts of data
> > processing in the backbone which is transparently integrated with
> > configurable components exposed for further extension.
> >
> > == Rationale ==
> > Coral’s vision lies in providing means for flexibly supporting a wide
> > variety of job execution scenarios for users while facilitating system
> > developers to extend the execution framework with various
> > functionalities at the same time. The capabilities of the system can
> > be extended as it grows to meet a more variety of execution scenarios.
> > We require inputs from users and developers from diverse domains in
> > order to make it a more thriving and useful project. The Apache
> > Software Foundation provides the best tools and community to support
> > this vision.
> >
> > == Initial Goals ==
> > Initial goals will be to move the existing codebase to Apache and
> > integrate with the Apache development process. We further plan to
> > develop our system to meet the needs for more execution scenarios for
> > a more variety of deployment characteristics.
> >
> > == Current Status ==
> > Coral codebase is currently hosted in a repository at github.com. The
> > current version has been developed by system developers at Seoul
> > National University, Viva Republica, Samsung, and LG.
> >
> > == Meritocracy ==
> > We plan to strongly support meritocracy. We will discuss the
> > requirements in an open forum, and those that continuously contribute
> > to Coral with the passion to strengthen the system will be invited as
> > committers. Contributors that enrich Coral by providing various use
> > cases, various implementations of the configurable components
> > including ideas for optimization techniques will be especially
> > welcome. Committers with a deep understanding of the system’s
> > technical aspects as a whole and its philosophy will definitely be
> > voted as the PMC. We will monitor community participation so that
> > privileges can be extended to those that contribute.
> >
> > == Community ==
> > We hope to expand our contribution community by becoming an Apache
> > incubator project. The contributions will come from both users and
> > system developers interested in flexibility and extensibility of job
> > executions that Coral can support. We expect users to mainly
> > contribute to diversify the use cases and deployment characteristics,
> > and developers to  contribute to implement them.
> >
> > == Alignment ==
> > Apache Spark is one of many popular data processing frameworks. The
> > system is designed towards optimizing jobs using RDDs in memory and
> > many other optimizations built tightly within the framework. In
> > contrast to Spark, Coral aims to provide more flexibility for job
> > execution in an easy manner.
> >
> > Apache Tez enables developers to build complex task DAGs with control
> > over the control plane of job execution. In Coral, a high-level
> > programming layer (ex. Apache Beam) is automatically converted to a
> > basic IR DAG and can be converted to any IR DAG through a series of
> > easy user writable passes, that can both reshape and modify the
> > annotation (of execution properties) of the DAG. Moreover, Coral
> > leaves more parts of the job execution configurable, such as the
> > scheduler and the data plane. As opposed to providing a set of
> > properties for solid optimization, Coral’s configurable parts can be
> > easily extended and explored by implementing the pre-defined
> > interfaces. For example, an arbitrary intermediate data store can be
> > added.
> >
> > Coral currently supports Apache Beam programs and we are working on
> > supporting Apache Spark programs as well. Coral also utilizes Apache
> > REEF for container management, which allows Coral to run in Apache
> > YARN and Apache Mesos clusters. If necessary, we plan to contribute to
> > and collaborate with these other Apache projects for the benefit of
> > all. We plan to extend such integrations with more Apache softwares.
> > Apache software foundation already hosts many major big-data systems,
> > and we expect to help further growth of the big-data community by
> > having Coral within the Apache foundation.
> >
> > == Known Risks ==
> > === Orphaned Products ===
> > The risk of the Coral project being orphaned is minimal. There is
> > already plenty of work that arduously support different deployment
> > characteristics, and we propose a general way to implement them with
> > flexible and extensible configuration knobs. The domain of data
> > processing is already of high interest, and this domain is expected to
> > evolve continuously with various other purposes, such as resource
> > disaggregation and using transient resources for better datacenter
> > resource utilization.
> >
> > === Inexperience with Open Source ===
> > The initial committers include PMC members and committers of other
> > Apache projects. They have experience with open source projects,
> > starting from their incubation to the top-level. They have been
> > involved in the open source development process, and are familiar with
> > releasing code under an open source license.
> >
> > === Homogeneous Developers ===
> > The initial set of committers is from a limited set of organizations,
> > but we expect to attract new contributors from diverse organizations
> > and will thus grow organically once approved for incubation. Our prior
> > experience with other open source projects will help various
> > contributors to actively participate in our project.
> >
> > === Reliance on Salaried Developers ===
> > Many developers are from Seoul National University. This is not
> applicable.
> >
> > === Relationships with Other Apache Products ===
> > Coral positions itself among multiple Apache products. It runs on
> > Apache REEF for container management. It also utilizes many useful
> > development tools including Apache Maven, Apache Log4J, and multiple
> > Apache Commons components. Coral supports the Apache Beam programming
> > model for user applications. We are currently working on supporting
> > the Apache Spark programming APIs as well.
> >
> > === An Excessive Fascination with the Apache Brand ===
> > We hope to make Coral a powerful system for data processing, meeting
> > various needs for different deployment characteristics, under a more
> > variety of environments. We see the limitations of simply putting code
> > on GitHub, and we believe the Apache community will help the growth of
> > Coral for the project to become a positively impactful and innovative
> > open source software. We believe Coral is a great fit for the Apache
> > Software Foundation due to the collaboration it aims to achieve from
> > the big data processing community.
> >
> > == Documentation ==
> > The current documentation for Coral is at
> https://snuspl.github.io/coral/.
> >
> > == Initial Source ==
> > The Coral codebase is currently hosted at
> https://github.com/snuspl/coral.
> >
> > == External Dependencies ==
> > To the best of our knowledge, all Coral dependencies are distributed
> > under Apache compatible licenses. Upon acceptance to the incubator, we
> > would begin a thorough analysis of all transitive dependencies to
> > verify this fact and further introduce license checking into the build
> > and release process.
> >
> > == Cryptography ==
> > Not applicable.
> >
> > == Required Resources ==
> > === Mailing Lists ===
> > We will operate two mailing lists as follows:
> >   * Coral PMC discussions: private@coral.incubator.apache.org
> >   * Coral developers: dev@coral.incubator.apache.org
> >
> > === Git Repositories ===
> > Upon incubation: https://github.com/apache/incubator-coral.
> > After the incubation, we would like to move the existing repo
> > https://github.com/snuspl/coral to the Apache infrastructure
> >
> > === Issue Tracking ===
> > Coral currently tracks its issues using the Github issue tracker:
> > https://github.com/snuspl/coral/issues. We plan to migrate to Apache
> > JIRA.
> >
> > == Initial Committers ==
> >  * Byung-Gon Chun
> >  * Jeongyoon Eo
> >  * Geon-Woo Kim
> >  * Joo Yeon Kim
> >  * Gyewon Lee
> >  * Jung-Gil Lee
> >  * Sanha Lee
> >  * Wooyeon Lee
> >  * Yunseong Lee
> >  * JangHo Seo
> >  * Won Wook Song
> >  * Taegeon Um
> >  * Youngseok Yang
> >
> > == Affiliations ==
> >  * SNU (Seoul National University)
> >    * Byung-Gon Chun
> >    * Jeongyoon Eo
> >    * Geon-Woo Kim
> >    * Gyewon Lee
> >    * Sanha Lee
> >    * Wooyeon Lee
> >    * Yunseong Lee
> >    * JangHo Seo
> >    * Won Wook Song
> >    * Taegeon Um
> >    * Youngseok Yang
> >
> >  * LG
> >    * Jung-Gil Lee
> >
> >  * Samsung
> >    * Joo Yeon Kim
> >
> >  * Viva Republica
> >    * Geon-Woo Kim
> >
> > == Sponsors ==
> > === Champions ===
> > Byung-Gon Chun
> >
> > === Mentors ===
> >  * Hyunsik Choi
> >  * Byung-Gon Chun
> >  * Jean-Baptiste Onofré
> >  * Markus Weimer
> >  * Reynold Xin
> >
> > === Sponsoring Entity ===
> > The Apache Incubator
> >
> >
> > Thanks!
> > Byung-Gon Chun
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message