incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Byung-Gon Chun <bgc...@gmail.com>
Subject Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Date Wed, 31 Jan 2018 08:50:07 GMT
On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
wrote:

> Hi,
>
> Coral is a good name !
>

Thanks!


>
> Does the code belong to Seoul National University ? In that case, in
> addition of
> your ICLA, we would need a SGA (it's not blocker for the project
> bootstrapping
> or code donation, but we, at least, will need it later for graduation). On
> the
> other hand, if the committers are all part on the university, you can also
> sign
> a CCLA.
>

I will figure this out.


>
> Happy to be mentor on the project if you want me ! ;)
>
>
Thanks! I will add you to the mentor list.

-Gon


> Thanks,
> Regards
> JB
>
> On 01/30/2018 10:17 AM, Byung-Gon Chun wrote:
> > Thanks for the comments, JB!
> > My replies are inlined below.
> >
> > On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
> > wrote:
> >
> >> Hi,
> >>
> >> sorry to be a little bit late on this.
> >>
> >> It's a very interesting proposal. It sounds pretty close to the
> portability
> >> layer we want to add in Apache Beam. I would love to see interaction
> >> between the
> >> two communities.
> >>
> >> I have two minor questions:
> >>
> >> 1. about the name: Onyx sounds very generic and the name is used in
> other
> >> technologies. Maybe another unique name would be more accurate.
> >>
> >
> > We proposed Coral instead. How does this sound?
> >
> >
> >> 2. the Onyx code is on github right now, under the Apache 2.0 license.
> >> Does this
> >> code has any affiliation with companies ? Meaning that we would need a
> SGA
> >> for
> >> the code donation.
> >>
> >> It does not. The developers are affiliated with Seoul National
> University.
> > In this case, do we still need a SGA?
> >
> >
> >> If you need any help for the incubation, I would be more than happy to
> >> help !
> >>
> >>
> > Thanks for the offer. Would you be interested in being a mentor of the
> > project?
> >
> > Thanks.
> > -Gon
> >
> >
> >
> >> Regards
> >> JB
> >>
> >> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
> >>> Dear Apache Incubator Community,
> >>>
> >>> Please accept the following proposal for presentation and discussion:
> >>> https://wiki.apache.org/incubator/OnyxProposal
> >>>
> >>> Onyx is a data processing system that aims to flexibly control the
> >> runtime
> >>> behaviors of a job to adapt to varying deployment characteristics
> (e.g.,
> >>> harnessing transient resources in datacenters, cross-datacenter
> >> deployment,
> >>> changing runtime based on job characteristics, etc.). Onyx provides
> ways
> >> to
> >>> extend the system’s capabilities and incorporate the extensions to the
> >>> flexible job execution.
> >>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into
> an
> >>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
> >>> based on a deployment policy.
> >>>
> >>> I've attached the proposal below.
> >>>
> >>> Best regards,
> >>> Byung-Gon Chun
> >>>
> >>> = OnyxProposal =
> >>>
> >>> == Abstract ==
> >>> Onyx is a data processing system for flexible employment with
> >>> different execution scenarios for various deployment characteristics
> >>> on clusters.
> >>>
> >>> == Proposal ==
> >>> Today, there is a wide variety of data processing systems with
> >>> different designs for better performance and datacenter efficiency.
> >>> They include processing data on specific resource environments and
> >>> running jobs with specific attributes. Although each system
> >>> successfully solves the problems it targets, most systems are designed
> >>> in the way that runtime behaviors are built tightly inside the system
> >>> core to hide the complexity of distributed computing. This makes it
> >>> hard for a single system to support different deployment
> >>> characteristics with different runtime behaviors without substantial
> >>> effort.
> >>>
> >>> Onyx is a data processing system that aims to flexibly control the
> >>> runtime behaviors of a job to adapt to varying deployment
> >>> characteristics. Moreover, it provides a means of extending the
> >>> system’s capabilities and incorporating the extensions to the flexible
> >>> job execution.
> >>>
> >>> In order to be able to easily modify runtime behaviors to adapt to
> >>> varying deployment characteristics, Onyx exposes runtime behaviors to
> >>> be flexibly configured and modified at both compile-time and runtime
> >>> through a set of high-level graph pass interfaces.
> >>>
> >>> We hope to contribute to the big data processing community by enabling
> >>> more flexibility and extensibility in job executions. Furthermore, we
> >>> can benefit more together as a community when we work together as a
> >>> community to mature the system with more use cases and understanding
> >>> of diverse deployment characteristics. The Apache Software Foundation
> >>> is the perfect place to achieve these aspirations.
> >>>
> >>> == Background ==
> >>> Many data processing systems have distinctive runtime behaviors
> >>> optimized and configured for specific deployment characteristics like
> >>> different resource environments and for handling special job
> >>> attributes.
> >>>
> >>> For example, much research have been conducted to overcome the
> >>> challenge of running data processing jobs on cheap, unreliable
> >>> transient resources. Likewise, techniques for disaggregating different
> >>> types of resources, like memory, CPU and GPU, are being actively
> >>> developed to use datacenter resources more efficiently. Many
> >>> researchers are also working to run data processing jobs in even more
> >>> diverse environments, such as across distant datacenters. Similarly,
> >>> for special job attributes, many works take different approaches, such
> >>> as runtime optimization, to solve problems like data skew, and to
> >>> optimize systems for data processing jobs with small-scale input data.
> >>>
> >>> Although each of the systems performs well with the jobs and in the
> >>> environments they target, they perform poorly with unconsidered cases,
> >>> and do not consider supporting multiple deployment characteristics on
> >>> a single system in their designs.
> >>>
> >>> For an application writer to optimize an application to perform well
> >>> on a certain system engraved with its underlying behaviors, it
> >>> requires a deep understanding of the system itself, which is an
> >>> overhead that often requires a lot of time and effort. Moreover, for a
> >>> developer to modify such system behaviors, it requires modifications
> >>> of the system core, which requires an even deeper understanding of the
> >>> system itself.
> >>>
> >>> With this background, Onyx is designed to represent all of its jobs as
> >>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> >>> applications from various programming models (ex. Apache Beam) are
> >>> submitted, transformed to an IR DAG, and optimized/customized for the
> >>> deployment characteristics. In the IR DAG optimization phase, the DAG
> >>> is modified through a series of compiler “passes” which reshape or
> >>> annotate the DAG with an expression of the underlying runtime
> >>> behaviors. The IR DAG is then submitted as an execution plan for the
> >>> Onyx runtime. The runtime includes the unmodified parts of data
> >>> processing in the backbone which is transparently integrated with
> >>> configurable components exposed for further extension.
> >>>
> >>> == Rationale ==
> >>> Onyx’s vision lies in providing means for flexibly supporting a wide
> >>> variety of job execution scenarios for users while facilitating system
> >>> developers to extend the execution framework with various
> >>> functionalities at the same time. The capabilities of the system can
> >>> be extended as it grows to meet a more variety of execution scenarios.
> >>> We require inputs from users and developers from diverse domains in
> >>> order to make it a more thriving and useful project. The Apache
> >>> Software Foundation provides the best tools and community to support
> >>> this vision.
> >>>
> >>> == Initial Goals ==
> >>> Initial goals will be to move the existing codebase to Apache and
> >>> integrate with the Apache development process. We further plan to
> >>> develop our system to meet the needs for more execution scenarios for
> >>> a more variety of deployment characteristics.
> >>>
> >>> == Current Status ==
> >>> Onyx codebase is currently hosted in a repository at github.com. The
> >>> current version has been developed by system developers at Seoul
> >>> National University, Viva Republica, Samsung, and LG.
> >>>
> >>> == Meritocracy ==
> >>> We plan to strongly support meritocracy. We will discuss the
> >>> requirements in an open forum, and those that continuously contribute
> >>> to Onyx with the passion to strengthen the system will be invited as
> >>> committers. Contributors that enrich Onyx by providing various use
> >>> cases, various implementations of the configurable components
> >>> including ideas for optimization techniques will be especially
> >>> welcome. Committers with a deep understanding of the system’s
> >>> technical aspects as a whole and its philosophy will definitely be
> >>> voted as the PMC. We will monitor community participation so that
> >>> privileges can be extended to those that contribute.
> >>>
> >>> == Community ==
> >>> We hope to expand our contribution community by becoming an Apache
> >>> incubator project. The contributions will come from both users and
> >>> system developers interested in flexibility and extensibility of job
> >>> executions that Onyx can support. We expect users to mainly contribute
> >>> to diversify the use cases and deployment characteristics, and
> >>> developers to  contribute to implement them.
> >>>
> >>> == Alignment ==
> >>> Apache Spark is one of many popular data processing frameworks. The
> >>> system is designed towards optimizing jobs using RDDs in memory and
> >>> many other optimizations built tightly within the framework. In
> >>> contrast to Spark, Onyx aims to provide more flexibility for job
> >>> execution in an easy manner.
> >>>
> >>> Apache Tez enables developers to build complex task DAGs with control
> >>> over the control plane of job execution. In Onyx, a high-level
> >>> programming layer (ex. Apache Beam) is automatically converted to a
> >>> basic IR DAG and can be converted to any IR DAG through a series of
> >>> easy user writable passes, that can both reshape and modify the
> >>> annotation (of execution properties) of the DAG. Moreover, Onyx leaves
> >>> more parts of the job execution configurable, such as the scheduler
> >>> and the data plane. As opposed to providing a set of properties for
> >>> solid optimization, Onyx’s configurable parts can be easily extended
> >>> and explored by implementing the pre-defined interfaces. For example,
> >>> an arbitrary intermediate data store can be added.
> >>>
> >>> Onyx currently supports Apache Beam programs and we are working on
> >>> supporting Apache Spark programs as well. Onyx also utilizes Apache
> >>> REEF for container management, which allows Onyx to run in Apache YARN
> >>> and Apache Mesos clusters. If necessary, we plan to contribute to and
> >>> collaborate with these other Apache projects for the benefit of all.
> >>> We plan to extend such integrations with more Apache softwares. Apache
> >>> software foundation already hosts many major big-data systems, and we
> >>> expect to help further growth of the big-data community by having Onyx
> >>> within the Apache foundation.
> >>>
> >>> == Known Risks ==
> >>> === Orphaned Products ===
> >>> The risk of the Onyx project being orphaned is minimal. There is
> >>> already plenty of work that arduously support different deployment
> >>> characteristics, and we propose a general way to implement them with
> >>> flexible and extensible configuration knobs. The domain of data
> >>> processing is already of high interest, and this domain is expected to
> >>> evolve continuously with various other purposes, such as resource
> >>> disaggregation and using transient resources for better datacenter
> >>> resource utilization.
> >>>
> >>> === Inexperience with Open Source ===
> >>> The initial committers include PMC members and committers of other
> >>> Apache projects. They have experience with open source projects,
> >>> starting from their incubation to the top-level. They have been
> >>> involved in the open source development process, and are familiar with
> >>> releasing code under an open source license.
> >>>
> >>> === Homogeneous Developers ===
> >>> The initial set of committers is from a limited set of organizations,
> >>> but we expect to attract new contributors from diverse organizations
> >>> and will thus grow organically once approved for incubation. Our prior
> >>> experience with other open source projects will help various
> >>> contributors to actively participate in our project.
> >>>
> >>> === Reliance on Salaried Developers ===
> >>> Many developers are from Seoul National University. This is not
> >> applicable.
> >>>
> >>> === Relationships with Other Apache Products ===
> >>> Onyx positions itself among multiple Apache products. It runs on
> >>> Apache REEF for container management. It also utilizes many useful
> >>> development tools including Apache Maven, Apache Log4J, and multiple
> >>> Apache Commons components. Onyx supports the Apache Beam programming
> >>> model for user applications. We are currently working on supporting
> >>> the Apache Spark programming APIs as well.
> >>>
> >>> === An Excessive Fascination with the Apache Brand ===
> >>> We hope to make Onyx a powerful system for data processing, meeting
> >>> various needs for different deployment characteristics, under a more
> >>> variety of environments. We see the limitations of simply putting code
> >>> on GitHub, and we believe the Apache community will help the growth of
> >>> Onyx for the project to become a positively impactful and innovative
> >>> open source software. We believe Onyx is a great fit for the Apache
> >>> Software Foundation due to the collaboration it aims to achieve from
> >>> the big data processing community.
> >>>
> >>> == Documentation ==
> >>> The current documentation for Onyx is at
> https://snuspl.github.io/onyx/.
> >>>
> >>> == Initial Source ==
> >>> The Onyx codebase is currently hosted at
> https://github.com/snuspl/onyx.
> >>>
> >>> == External Dependencies ==
> >>> To the best of our knowledge, all Onyx dependencies are distributed
> >>> under Apache compatible licenses. Upon acceptance to the incubator, we
> >>> would begin a thorough analysis of all transitive dependencies to
> >>> verify this fact and further introduce license checking into the build
> >>> and release process.
> >>>
> >>> == Cryptography ==
> >>> Not applicable.
> >>>
> >>> == Required Resources ==
> >>> === Mailing Lists ===
> >>> We will operate two mailing lists as follows:
> >>>    * Onyx PMC discussions: private@onyx.incubator.apache.org
> >>>    * Onyx developers: dev@onyx.incubator.apache.org
> >>>
> >>> === Git Repositories ===
> >>> Upon incubation: https://github.com/apache/incubator-onyx.
> >>> After the incubation, we would like to move the existing repo
> >>> https://github.com/snuspl/onyx to the Apache infrastructure
> >>>
> >>> === Issue Tracking ===
> >>> Onyx currently tracks its issues using the Github issue tracker:
> >>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> >>> JIRA.
> >>>
> >>> == Initial Committers ==
> >>>   * Byung-Gon Chun
> >>>   * Jeongyoon Eo
> >>>   * Geon-Woo Kim
> >>>   * Joo Yeon Kim
> >>>   * Gyewon Lee
> >>>   * Jung-Gil Lee
> >>>   * Sanha Lee
> >>>   * Wooyeon Lee
> >>>   * Yunseong Lee
> >>>   * JangHo Seo
> >>>   * Won Wook Song
> >>>   * Taegeon Um
> >>>   * Youngseok Yang
> >>>
> >>> == Affiliations ==
> >>>   * SNU (Seoul National University)
> >>>     * Byung-Gon Chun
> >>>     * Jeongyoon Eo
> >>>     * Geon-Woo Kim
> >>>     * Gyewon Lee
> >>>     * Sanha Lee
> >>>     * Wooyeon Lee
> >>>     * Yunseong Lee
> >>>     * JangHo Seo
> >>>     * Won Wook Song
> >>>     * Taegeon Um
> >>>     * Youngseok Yang
> >>>
> >>>   * LG
> >>>     * Jung-Gil Lee
> >>>
> >>>   * Samsung
> >>>     * Joo Yeon Kim
> >>>
> >>>   * Viva Republica
> >>>     * Geon-Woo Kim
> >>>
> >>> == Sponsors ==
> >>> === Champions ===
> >>> Byung-Gon Chun
> >>>
> >>> === Mentors ===
> >>>   * Hyunsik Choi
> >>>   * Byung-Gon Chun
> >>>   * Markus Weimer
> >>>   * Reynold Xin
> >>>
> >>> === Sponsoring Entity ===
> >>> The Apache Incubator
> >>>
> >>>
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbonofre@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >>
> >
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
Byung-Gon Chun

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message