incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Date Wed, 31 Jan 2018 14:54:03 GMT
Thanks, much appreciated !

Regards
JB

On 01/31/2018 09:50 AM, Byung-Gon Chun wrote:
> On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
> wrote:
> 
>> Hi,
>>
>> Coral is a good name !
>>
> 
> Thanks!
> 
> 
>>
>> Does the code belong to Seoul National University ? In that case, in
>> addition of
>> your ICLA, we would need a SGA (it's not blocker for the project
>> bootstrapping
>> or code donation, but we, at least, will need it later for graduation). On
>> the
>> other hand, if the committers are all part on the university, you can also
>> sign
>> a CCLA.
>>
> 
> I will figure this out.
> 
> 
>>
>> Happy to be mentor on the project if you want me ! ;)
>>
>>
> Thanks! I will add you to the mentor list.
> 
> -Gon
> 
> 
>> Thanks,
>> Regards
>> JB
>>
>> On 01/30/2018 10:17 AM, Byung-Gon Chun wrote:
>>> Thanks for the comments, JB!
>>> My replies are inlined below.
>>>
>>> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> sorry to be a little bit late on this.
>>>>
>>>> It's a very interesting proposal. It sounds pretty close to the
>> portability
>>>> layer we want to add in Apache Beam. I would love to see interaction
>>>> between the
>>>> two communities.
>>>>
>>>> I have two minor questions:
>>>>
>>>> 1. about the name: Onyx sounds very generic and the name is used in
>> other
>>>> technologies. Maybe another unique name would be more accurate.
>>>>
>>>
>>> We proposed Coral instead. How does this sound?
>>>
>>>
>>>> 2. the Onyx code is on github right now, under the Apache 2.0 license.
>>>> Does this
>>>> code has any affiliation with companies ? Meaning that we would need a
>> SGA
>>>> for
>>>> the code donation.
>>>>
>>>> It does not. The developers are affiliated with Seoul National
>> University.
>>> In this case, do we still need a SGA?
>>>
>>>
>>>> If you need any help for the incubation, I would be more than happy to
>>>> help !
>>>>
>>>>
>>> Thanks for the offer. Would you be interested in being a mentor of the
>>> project?
>>>
>>> Thanks.
>>> -Gon
>>>
>>>
>>>
>>>> Regards
>>>> JB
>>>>
>>>> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
>>>>> Dear Apache Incubator Community,
>>>>>
>>>>> Please accept the following proposal for presentation and discussion:
>>>>> https://wiki.apache.org/incubator/OnyxProposal
>>>>>
>>>>> Onyx is a data processing system that aims to flexibly control the
>>>> runtime
>>>>> behaviors of a job to adapt to varying deployment characteristics
>> (e.g.,
>>>>> harnessing transient resources in datacenters, cross-datacenter
>>>> deployment,
>>>>> changing runtime based on job characteristics, etc.). Onyx provides
>> ways
>>>> to
>>>>> extend the system’s capabilities and incorporate the extensions to
the
>>>>> flexible job execution.
>>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into
>> an
>>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>>>>> based on a deployment policy.
>>>>>
>>>>> I've attached the proposal below.
>>>>>
>>>>> Best regards,
>>>>> Byung-Gon Chun
>>>>>
>>>>> = OnyxProposal =
>>>>>
>>>>> == Abstract ==
>>>>> Onyx is a data processing system for flexible employment with
>>>>> different execution scenarios for various deployment characteristics
>>>>> on clusters.
>>>>>
>>>>> == Proposal ==
>>>>> Today, there is a wide variety of data processing systems with
>>>>> different designs for better performance and datacenter efficiency.
>>>>> They include processing data on specific resource environments and
>>>>> running jobs with specific attributes. Although each system
>>>>> successfully solves the problems it targets, most systems are designed
>>>>> in the way that runtime behaviors are built tightly inside the system
>>>>> core to hide the complexity of distributed computing. This makes it
>>>>> hard for a single system to support different deployment
>>>>> characteristics with different runtime behaviors without substantial
>>>>> effort.
>>>>>
>>>>> Onyx is a data processing system that aims to flexibly control the
>>>>> runtime behaviors of a job to adapt to varying deployment
>>>>> characteristics. Moreover, it provides a means of extending the
>>>>> system’s capabilities and incorporating the extensions to the flexible
>>>>> job execution.
>>>>>
>>>>> In order to be able to easily modify runtime behaviors to adapt to
>>>>> varying deployment characteristics, Onyx exposes runtime behaviors to
>>>>> be flexibly configured and modified at both compile-time and runtime
>>>>> through a set of high-level graph pass interfaces.
>>>>>
>>>>> We hope to contribute to the big data processing community by enabling
>>>>> more flexibility and extensibility in job executions. Furthermore, we
>>>>> can benefit more together as a community when we work together as a
>>>>> community to mature the system with more use cases and understanding
>>>>> of diverse deployment characteristics. The Apache Software Foundation
>>>>> is the perfect place to achieve these aspirations.
>>>>>
>>>>> == Background ==
>>>>> Many data processing systems have distinctive runtime behaviors
>>>>> optimized and configured for specific deployment characteristics like
>>>>> different resource environments and for handling special job
>>>>> attributes.
>>>>>
>>>>> For example, much research have been conducted to overcome the
>>>>> challenge of running data processing jobs on cheap, unreliable
>>>>> transient resources. Likewise, techniques for disaggregating different
>>>>> types of resources, like memory, CPU and GPU, are being actively
>>>>> developed to use datacenter resources more efficiently. Many
>>>>> researchers are also working to run data processing jobs in even more
>>>>> diverse environments, such as across distant datacenters. Similarly,
>>>>> for special job attributes, many works take different approaches, such
>>>>> as runtime optimization, to solve problems like data skew, and to
>>>>> optimize systems for data processing jobs with small-scale input data.
>>>>>
>>>>> Although each of the systems performs well with the jobs and in the
>>>>> environments they target, they perform poorly with unconsidered cases,
>>>>> and do not consider supporting multiple deployment characteristics on
>>>>> a single system in their designs.
>>>>>
>>>>> For an application writer to optimize an application to perform well
>>>>> on a certain system engraved with its underlying behaviors, it
>>>>> requires a deep understanding of the system itself, which is an
>>>>> overhead that often requires a lot of time and effort. Moreover, for
a
>>>>> developer to modify such system behaviors, it requires modifications
>>>>> of the system core, which requires an even deeper understanding of the
>>>>> system itself.
>>>>>
>>>>> With this background, Onyx is designed to represent all of its jobs as
>>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
>>>>> applications from various programming models (ex. Apache Beam) are
>>>>> submitted, transformed to an IR DAG, and optimized/customized for the
>>>>> deployment characteristics. In the IR DAG optimization phase, the DAG
>>>>> is modified through a series of compiler “passes” which reshape or
>>>>> annotate the DAG with an expression of the underlying runtime
>>>>> behaviors. The IR DAG is then submitted as an execution plan for the
>>>>> Onyx runtime. The runtime includes the unmodified parts of data
>>>>> processing in the backbone which is transparently integrated with
>>>>> configurable components exposed for further extension.
>>>>>
>>>>> == Rationale ==
>>>>> Onyx’s vision lies in providing means for flexibly supporting a wide
>>>>> variety of job execution scenarios for users while facilitating system
>>>>> developers to extend the execution framework with various
>>>>> functionalities at the same time. The capabilities of the system can
>>>>> be extended as it grows to meet a more variety of execution scenarios.
>>>>> We require inputs from users and developers from diverse domains in
>>>>> order to make it a more thriving and useful project. The Apache
>>>>> Software Foundation provides the best tools and community to support
>>>>> this vision.
>>>>>
>>>>> == Initial Goals ==
>>>>> Initial goals will be to move the existing codebase to Apache and
>>>>> integrate with the Apache development process. We further plan to
>>>>> develop our system to meet the needs for more execution scenarios for
>>>>> a more variety of deployment characteristics.
>>>>>
>>>>> == Current Status ==
>>>>> Onyx codebase is currently hosted in a repository at github.com. The
>>>>> current version has been developed by system developers at Seoul
>>>>> National University, Viva Republica, Samsung, and LG.
>>>>>
>>>>> == Meritocracy ==
>>>>> We plan to strongly support meritocracy. We will discuss the
>>>>> requirements in an open forum, and those that continuously contribute
>>>>> to Onyx with the passion to strengthen the system will be invited as
>>>>> committers. Contributors that enrich Onyx by providing various use
>>>>> cases, various implementations of the configurable components
>>>>> including ideas for optimization techniques will be especially
>>>>> welcome. Committers with a deep understanding of the system’s
>>>>> technical aspects as a whole and its philosophy will definitely be
>>>>> voted as the PMC. We will monitor community participation so that
>>>>> privileges can be extended to those that contribute.
>>>>>
>>>>> == Community ==
>>>>> We hope to expand our contribution community by becoming an Apache
>>>>> incubator project. The contributions will come from both users and
>>>>> system developers interested in flexibility and extensibility of job
>>>>> executions that Onyx can support. We expect users to mainly contribute
>>>>> to diversify the use cases and deployment characteristics, and
>>>>> developers to  contribute to implement them.
>>>>>
>>>>> == Alignment ==
>>>>> Apache Spark is one of many popular data processing frameworks. The
>>>>> system is designed towards optimizing jobs using RDDs in memory and
>>>>> many other optimizations built tightly within the framework. In
>>>>> contrast to Spark, Onyx aims to provide more flexibility for job
>>>>> execution in an easy manner.
>>>>>
>>>>> Apache Tez enables developers to build complex task DAGs with control
>>>>> over the control plane of job execution. In Onyx, a high-level
>>>>> programming layer (ex. Apache Beam) is automatically converted to a
>>>>> basic IR DAG and can be converted to any IR DAG through a series of
>>>>> easy user writable passes, that can both reshape and modify the
>>>>> annotation (of execution properties) of the DAG. Moreover, Onyx leaves
>>>>> more parts of the job execution configurable, such as the scheduler
>>>>> and the data plane. As opposed to providing a set of properties for
>>>>> solid optimization, Onyx’s configurable parts can be easily extended
>>>>> and explored by implementing the pre-defined interfaces. For example,
>>>>> an arbitrary intermediate data store can be added.
>>>>>
>>>>> Onyx currently supports Apache Beam programs and we are working on
>>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache
>>>>> REEF for container management, which allows Onyx to run in Apache YARN
>>>>> and Apache Mesos clusters. If necessary, we plan to contribute to and
>>>>> collaborate with these other Apache projects for the benefit of all.
>>>>> We plan to extend such integrations with more Apache softwares. Apache
>>>>> software foundation already hosts many major big-data systems, and we
>>>>> expect to help further growth of the big-data community by having Onyx
>>>>> within the Apache foundation.
>>>>>
>>>>> == Known Risks ==
>>>>> === Orphaned Products ===
>>>>> The risk of the Onyx project being orphaned is minimal. There is
>>>>> already plenty of work that arduously support different deployment
>>>>> characteristics, and we propose a general way to implement them with
>>>>> flexible and extensible configuration knobs. The domain of data
>>>>> processing is already of high interest, and this domain is expected to
>>>>> evolve continuously with various other purposes, such as resource
>>>>> disaggregation and using transient resources for better datacenter
>>>>> resource utilization.
>>>>>
>>>>> === Inexperience with Open Source ===
>>>>> The initial committers include PMC members and committers of other
>>>>> Apache projects. They have experience with open source projects,
>>>>> starting from their incubation to the top-level. They have been
>>>>> involved in the open source development process, and are familiar with
>>>>> releasing code under an open source license.
>>>>>
>>>>> === Homogeneous Developers ===
>>>>> The initial set of committers is from a limited set of organizations,
>>>>> but we expect to attract new contributors from diverse organizations
>>>>> and will thus grow organically once approved for incubation. Our prior
>>>>> experience with other open source projects will help various
>>>>> contributors to actively participate in our project.
>>>>>
>>>>> === Reliance on Salaried Developers ===
>>>>> Many developers are from Seoul National University. This is not
>>>> applicable.
>>>>>
>>>>> === Relationships with Other Apache Products ===
>>>>> Onyx positions itself among multiple Apache products. It runs on
>>>>> Apache REEF for container management. It also utilizes many useful
>>>>> development tools including Apache Maven, Apache Log4J, and multiple
>>>>> Apache Commons components. Onyx supports the Apache Beam programming
>>>>> model for user applications. We are currently working on supporting
>>>>> the Apache Spark programming APIs as well.
>>>>>
>>>>> === An Excessive Fascination with the Apache Brand ===
>>>>> We hope to make Onyx a powerful system for data processing, meeting
>>>>> various needs for different deployment characteristics, under a more
>>>>> variety of environments. We see the limitations of simply putting code
>>>>> on GitHub, and we believe the Apache community will help the growth of
>>>>> Onyx for the project to become a positively impactful and innovative
>>>>> open source software. We believe Onyx is a great fit for the Apache
>>>>> Software Foundation due to the collaboration it aims to achieve from
>>>>> the big data processing community.
>>>>>
>>>>> == Documentation ==
>>>>> The current documentation for Onyx is at
>> https://snuspl.github.io/onyx/.
>>>>>
>>>>> == Initial Source ==
>>>>> The Onyx codebase is currently hosted at
>> https://github.com/snuspl/onyx.
>>>>>
>>>>> == External Dependencies ==
>>>>> To the best of our knowledge, all Onyx dependencies are distributed
>>>>> under Apache compatible licenses. Upon acceptance to the incubator, we
>>>>> would begin a thorough analysis of all transitive dependencies to
>>>>> verify this fact and further introduce license checking into the build
>>>>> and release process.
>>>>>
>>>>> == Cryptography ==
>>>>> Not applicable.
>>>>>
>>>>> == Required Resources ==
>>>>> === Mailing Lists ===
>>>>> We will operate two mailing lists as follows:
>>>>>    * Onyx PMC discussions: private@onyx.incubator.apache.org
>>>>>    * Onyx developers: dev@onyx.incubator.apache.org
>>>>>
>>>>> === Git Repositories ===
>>>>> Upon incubation: https://github.com/apache/incubator-onyx.
>>>>> After the incubation, we would like to move the existing repo
>>>>> https://github.com/snuspl/onyx to the Apache infrastructure
>>>>>
>>>>> === Issue Tracking ===
>>>>> Onyx currently tracks its issues using the Github issue tracker:
>>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
>>>>> JIRA.
>>>>>
>>>>> == Initial Committers ==
>>>>>   * Byung-Gon Chun
>>>>>   * Jeongyoon Eo
>>>>>   * Geon-Woo Kim
>>>>>   * Joo Yeon Kim
>>>>>   * Gyewon Lee
>>>>>   * Jung-Gil Lee
>>>>>   * Sanha Lee
>>>>>   * Wooyeon Lee
>>>>>   * Yunseong Lee
>>>>>   * JangHo Seo
>>>>>   * Won Wook Song
>>>>>   * Taegeon Um
>>>>>   * Youngseok Yang
>>>>>
>>>>> == Affiliations ==
>>>>>   * SNU (Seoul National University)
>>>>>     * Byung-Gon Chun
>>>>>     * Jeongyoon Eo
>>>>>     * Geon-Woo Kim
>>>>>     * Gyewon Lee
>>>>>     * Sanha Lee
>>>>>     * Wooyeon Lee
>>>>>     * Yunseong Lee
>>>>>     * JangHo Seo
>>>>>     * Won Wook Song
>>>>>     * Taegeon Um
>>>>>     * Youngseok Yang
>>>>>
>>>>>   * LG
>>>>>     * Jung-Gil Lee
>>>>>
>>>>>   * Samsung
>>>>>     * Joo Yeon Kim
>>>>>
>>>>>   * Viva Republica
>>>>>     * Geon-Woo Kim
>>>>>
>>>>> == Sponsors ==
>>>>> === Champions ===
>>>>> Byung-Gon Chun
>>>>>
>>>>> === Mentors ===
>>>>>   * Hyunsik Choi
>>>>>   * Byung-Gon Chun
>>>>>   * Markus Weimer
>>>>>   * Reynold Xin
>>>>>
>>>>> === Sponsoring Entity ===
>>>>> The Apache Incubator
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Jean-Baptiste Onofré
>>>> jbonofre@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>>>
>>>
>>>
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
> 
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message