incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Byung-Gon Chun <bgc...@gmail.com>
Subject Re: [VOTE] Accept Coral into the Apache Incubator
Date Thu, 01 Feb 2018 21:06:49 GMT
Thanks, Davor!
I will add you to the mentor list of Coral.

On Fri, Feb 2, 2018 at 4:23 AM, Davor Bonaci <davor@apache.org> wrote:

> +1 (binding)
>
> Also, happy to help, mentor, or be a connection with the Beam PMC, as
> appropriate.
>
> On Thu, Feb 1, 2018 at 9:54 AM, Kevin A. McGrail <kmcgrail@apache.org>
> wrote:
>
>> +1 Binding
>>
>>
>> On 2/1/2018 9:07 AM, Byung-Gon Chun wrote:
>>
>>> Hi all,
>>>
>>> I would like to start a VOTE to propose the Coral project as a podling
>>> into
>>> the Apache Incubator.
>>>
>>> The ASF voting rules are described at https://www.apache.org/foundation/
>>> voting.html
>>>
>>> A vote for accepting a new Apache Incubator podling is a majority vote
>>> for
>>> which only Incubator PMC member votes are binding.
>>>
>>> This vote will run for at least 72 hours. Please VOTE as follows.
>>> [] +1 Accept Coral into the Apache Incubator
>>> [] +0 Abstain
>>> [] -1 Do not accept Coral into the Apache Incubator because ...
>>>
>>> The proposal is listed below, but you can also access it on the wiki:
>>> https://wiki.apache.org/incubator/CoralProposal
>>>
>>> = CoralProposal =
>>>
>>> == Abstract ==
>>> Coral is a data processing system for flexible employment with
>>> different execution scenarios for various deployment characteristics
>>> on clusters.
>>>
>>> == Proposal ==
>>> Today, there is a wide variety of data processing systems with
>>> different designs for better performance and datacenter efficiency.
>>> They include processing data on specific resource environments and
>>> running jobs with specific attributes. Although each system
>>> successfully solves the problems it targets, most systems are designed
>>> in the way that runtime behaviors are built tightly inside the system
>>> core to hide the complexity of distributed computing. This makes it
>>> hard for a single system to support different deployment
>>> characteristics with different runtime behaviors without substantial
>>> effort.
>>>
>>> Coral is a data processing system that aims to flexibly control the
>>> runtime behaviors of a job to adapt to varying deployment
>>> characteristics. Moreover, it provides a means of extending the
>>> system’s capabilities and incorporating the extensions to the flexible
>>> job execution.
>>>
>>> In order to be able to easily modify runtime behaviors to adapt to
>>> varying deployment characteristics, Coral exposes runtime behaviors to
>>> be flexibly configured and modified at both compile-time and runtime
>>> through a set of high-level graph pass interfaces.
>>>
>>> We hope to contribute to the big data processing community by enabling
>>> more flexibility and extensibility in job executions. Furthermore, we
>>> can benefit more together as a community when we work together as a
>>> community to mature the system with more use cases and understanding
>>> of diverse deployment characteristics. The Apache Software Foundation
>>> is the perfect place to achieve these aspirations.
>>>
>>> == Background ==
>>> Many data processing systems have distinctive runtime behaviors
>>> optimized and configured for specific deployment characteristics like
>>> different resource environments and for handling special job
>>> attributes.
>>>
>>> For example, much research have been conducted to overcome the
>>> challenge of running data processing jobs on cheap, unreliable
>>> transient resources. Likewise, techniques for disaggregating different
>>> types of resources, like memory, CPU and GPU, are being actively
>>> developed to use datacenter resources more efficiently. Many
>>> researchers are also working to run data processing jobs in even more
>>> diverse environments, such as across distant datacenters. Similarly,
>>> for special job attributes, many works take different approaches, such
>>> as runtime optimization, to solve problems like data skew, and to
>>> optimize systems for data processing jobs with small-scale input data.
>>>
>>> Although each of the systems performs well with the jobs and in the
>>> environments they target, they perform poorly with unconsidered cases,
>>> and do not consider supporting multiple deployment characteristics on
>>> a single system in their designs.
>>>
>>> For an application writer to optimize an application to perform well
>>> on a certain system engraved with its underlying behaviors, it
>>> requires a deep understanding of the system itself, which is an
>>> overhead that often requires a lot of time and effort. Moreover, for a
>>> developer to modify such system behaviors, it requires modifications
>>> of the system core, which requires an even deeper understanding of the
>>> system itself.
>>>
>>> With this background, Coral is designed to represent all of its jobs
>>> as an Intermediate Representation (IR) DAG. In the Coral compiler,
>>> user applications from various programming models (ex. Apache Beam)
>>> are submitted, transformed to an IR DAG, and optimized/customized for
>>> the deployment characteristics. In the IR DAG optimization phase, the
>>> DAG is modified through a series of compiler “passes” which reshape or
>>> annotate the DAG with an expression of the underlying runtime
>>> behaviors. The IR DAG is then submitted as an execution plan for the
>>> Coral runtime. The runtime includes the unmodified parts of data
>>> processing in the backbone which is transparently integrated with
>>> configurable components exposed for further extension.
>>>
>>> == Rationale ==
>>> Coral’s vision lies in providing means for flexibly supporting a wide
>>> variety of job execution scenarios for users while facilitating system
>>> developers to extend the execution framework with various
>>> functionalities at the same time. The capabilities of the system can
>>> be extended as it grows to meet a more variety of execution scenarios.
>>> We require inputs from users and developers from diverse domains in
>>> order to make it a more thriving and useful project. The Apache
>>> Software Foundation provides the best tools and community to support
>>> this vision.
>>>
>>> == Initial Goals ==
>>> Initial goals will be to move the existing codebase to Apache and
>>> integrate with the Apache development process. We further plan to
>>> develop our system to meet the needs for more execution scenarios for
>>> a more variety of deployment characteristics.
>>>
>>> == Current Status ==
>>> Coral codebase is currently hosted in a repository at github.com. The
>>> current version has been developed by system developers at Seoul
>>> National University, Viva Republica, Samsung, and LG.
>>>
>>> == Meritocracy ==
>>> We plan to strongly support meritocracy. We will discuss the
>>> requirements in an open forum, and those that continuously contribute
>>> to Coral with the passion to strengthen the system will be invited as
>>> committers. Contributors that enrich Coral by providing various use
>>> cases, various implementations of the configurable components
>>> including ideas for optimization techniques will be especially
>>> welcome. Committers with a deep understanding of the system’s
>>> technical aspects as a whole and its philosophy will definitely be
>>> voted as the PMC. We will monitor community participation so that
>>> privileges can be extended to those that contribute.
>>>
>>> == Community ==
>>> We hope to expand our contribution community by becoming an Apache
>>> incubator project. The contributions will come from both users and
>>> system developers interested in flexibility and extensibility of job
>>> executions that Coral can support. We expect users to mainly
>>> contribute to diversify the use cases and deployment characteristics,
>>> and developers to  contribute to implement them.
>>>
>>> == Alignment ==
>>> Apache Spark is one of many popular data processing frameworks. The
>>> system is designed towards optimizing jobs using RDDs in memory and
>>> many other optimizations built tightly within the framework. In
>>> contrast to Spark, Coral aims to provide more flexibility for job
>>> execution in an easy manner.
>>>
>>> Apache Tez enables developers to build complex task DAGs with control
>>> over the control plane of job execution. In Coral, a high-level
>>> programming layer (ex. Apache Beam) is automatically converted to a
>>> basic IR DAG and can be converted to any IR DAG through a series of
>>> easy user writable passes, that can both reshape and modify the
>>> annotation (of execution properties) of the DAG. Moreover, Coral
>>> leaves more parts of the job execution configurable, such as the
>>> scheduler and the data plane. As opposed to providing a set of
>>> properties for solid optimization, Coral’s configurable parts can be
>>> easily extended and explored by implementing the pre-defined
>>> interfaces. For example, an arbitrary intermediate data store can be
>>> added.
>>>
>>> Coral currently supports Apache Beam programs and we are working on
>>> supporting Apache Spark programs as well. Coral also utilizes Apache
>>> REEF for container management, which allows Coral to run in Apache
>>> YARN and Apache Mesos clusters. If necessary, we plan to contribute to
>>> and collaborate with these other Apache projects for the benefit of
>>> all. We plan to extend such integrations with more Apache softwares.
>>> Apache software foundation already hosts many major big-data systems,
>>> and we expect to help further growth of the big-data community by
>>> having Coral within the Apache foundation.
>>>
>>> == Known Risks ==
>>> === Orphaned Products ===
>>> The risk of the Coral project being orphaned is minimal. There is
>>> already plenty of work that arduously support different deployment
>>> characteristics, and we propose a general way to implement them with
>>> flexible and extensible configuration knobs. The domain of data
>>> processing is already of high interest, and this domain is expected to
>>> evolve continuously with various other purposes, such as resource
>>> disaggregation and using transient resources for better datacenter
>>> resource utilization.
>>>
>>> === Inexperience with Open Source ===
>>> The initial committers include PMC members and committers of other
>>> Apache projects. They have experience with open source projects,
>>> starting from their incubation to the top-level. They have been
>>> involved in the open source development process, and are familiar with
>>> releasing code under an open source license.
>>>
>>> === Homogeneous Developers ===
>>> The initial set of committers is from a limited set of organizations,
>>> but we expect to attract new contributors from diverse organizations
>>> and will thus grow organically once approved for incubation. Our prior
>>> experience with other open source projects will help various
>>> contributors to actively participate in our project.
>>>
>>> === Reliance on Salaried Developers ===
>>> Many developers are from Seoul National University. This is not
>>> applicable.
>>>
>>> === Relationships with Other Apache Products ===
>>> Coral positions itself among multiple Apache products. It runs on
>>> Apache REEF for container management. It also utilizes many useful
>>> development tools including Apache Maven, Apache Log4J, and multiple
>>> Apache Commons components. Coral supports the Apache Beam programming
>>> model for user applications. We are currently working on supporting
>>> the Apache Spark programming APIs as well.
>>>
>>> === An Excessive Fascination with the Apache Brand ===
>>> We hope to make Coral a powerful system for data processing, meeting
>>> various needs for different deployment characteristics, under a more
>>> variety of environments. We see the limitations of simply putting code
>>> on GitHub, and we believe the Apache community will help the growth of
>>> Coral for the project to become a positively impactful and innovative
>>> open source software. We believe Coral is a great fit for the Apache
>>> Software Foundation due to the collaboration it aims to achieve from
>>> the big data processing community.
>>>
>>> == Documentation ==
>>> The current documentation for Coral is at https://snuspl.github.io/coral
>>> /.
>>>
>>> == Initial Source ==
>>> The Coral codebase is currently hosted at https://github.com/snuspl/cora
>>> l.
>>>
>>> == External Dependencies ==
>>> To the best of our knowledge, all Coral dependencies are distributed
>>> under Apache compatible licenses. Upon acceptance to the incubator, we
>>> would begin a thorough analysis of all transitive dependencies to
>>> verify this fact and further introduce license checking into the build
>>> and release process.
>>>
>>> == Cryptography ==
>>> Not applicable.
>>>
>>> == Required Resources ==
>>> === Mailing Lists ===
>>> We will operate two mailing lists as follows:
>>>     * Coral PMC discussions: private@coral.incubator.apache.org
>>>     * Coral developers: dev@coral.incubator.apache.org
>>>
>>> === Git Repositories ===
>>> Upon incubation: https://github.com/apache/incubator-coral.
>>> After the incubation, we would like to move the existing repo
>>> https://github.com/snuspl/coral to the Apache infrastructure
>>>
>>> === Issue Tracking ===
>>> Coral currently tracks its issues using the Github issue tracker:
>>> https://github.com/snuspl/coral/issues. We plan to migrate to Apache
>>> JIRA.
>>>
>>> == Initial Committers ==
>>>    * Byung-Gon Chun
>>>    * Jeongyoon Eo
>>>    * Geon-Woo Kim
>>>    * Joo Yeon Kim
>>>    * Gyewon Lee
>>>    * Jung-Gil Lee
>>>    * Sanha Lee
>>>    * Wooyeon Lee
>>>    * Yunseong Lee
>>>    * JangHo Seo
>>>    * Won Wook Song
>>>    * Taegeon Um
>>>    * Youngseok Yang
>>>
>>> == Affiliations ==
>>>    * SNU (Seoul National University)
>>>      * Byung-Gon Chun
>>>      * Jeongyoon Eo
>>>      * Geon-Woo Kim
>>>      * Gyewon Lee
>>>      * Sanha Lee
>>>      * Wooyeon Lee
>>>      * Yunseong Lee
>>>      * JangHo Seo
>>>      * Won Wook Song
>>>      * Taegeon Um
>>>      * Youngseok Yang
>>>
>>>    * LG
>>>      * Jung-Gil Lee
>>>
>>>    * Samsung
>>>      * Joo Yeon Kim
>>>
>>>    * Viva Republica
>>>      * Geon-Woo Kim
>>>
>>> == Sponsors ==
>>> === Champions ===
>>> Byung-Gon Chun
>>>
>>> === Mentors ===
>>>    * Hyunsik Choi
>>>    * Byung-Gon Chun
>>>    * Jean-Baptiste Onofré
>>>    * Markus Weimer
>>>    * Reynold Xin
>>>
>>> === Sponsoring Entity ===
>>> The Apache Incubator
>>>
>>>
>>> Thanks!
>>> Byung-Gon Chun
>>>
>>>
>> --
>> Kevin A. McGrail
>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>> Chair Emeritus Apache SpamAssassin Project
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>


-- 
Byung-Gon Chun

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message