From general-return-63245-archive-asf-public=cust-asf.ponee.io@incubator.apache.org Sat Jan 27 09:04:28 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 5407D18065B for ; Sat, 27 Jan 2018 09:04:28 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 43C3F160C30; Sat, 27 Jan 2018 08:04:28 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 17264160C2F for ; Sat, 27 Jan 2018 09:04:26 +0100 (CET) Received: (qmail 29763 invoked by uid 500); 27 Jan 2018 08:04:25 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 29751 invoked by uid 99); 27 Jan 2018 08:04:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Jan 2018 08:04:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 39C24180224 for ; Sat, 27 Jan 2018 08:04:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.88 X-Spam-Level: * X-Spam-Status: No, score=1.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id qDkL_Nwgt4fB for ; Sat, 27 Jan 2018 08:04:18 +0000 (UTC) Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 8CAC65F2F0 for ; Sat, 27 Jan 2018 08:04:17 +0000 (UTC) Received: by mail-wm0-f53.google.com with SMTP id g1so5207060wmg.2 for ; Sat, 27 Jan 2018 00:04:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=KLESFg0I40+AlPBIB4PIPW90+DDV3D4mY5s1O/8LVMY=; b=sNMqb+sbV+JzkZcidNEWcqttZ6ughML+ySEB5C0S6zlc1xjgndfYa7dMXVVIKPTK0J jTozywsp2J199Ix+dQvTYdl8tNOxQm2qhCZxRh65MuYggA7OR6J6o7ROt5kJVokOfr4h e4y8GCBtgWpW/n5FIiXWEpaEKpXFxuFFVWDTLi5DL7cH2R1wSmieBnn1E2nGVZulAJQW 4sQHRNifCJDcOM+jY1VMM4T3cjoQgdp64ODe8Xb7lJBmb8JNQBhH2SUHF/sqMW0WIl7c hCZqinRZpN5ceBJ31m+qDHjy156DiNT/3WVaZnrH6plNiRSVt8OcyMjrR6KtHzcY+DGe 0+bA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=KLESFg0I40+AlPBIB4PIPW90+DDV3D4mY5s1O/8LVMY=; b=EEHE+O4w71Q2dhxN95JDFjlfv7mxn5SnX+xqMvfTXecPccDObdZn7YPM2zj0zwlXYH 9SI4bNPMJlfvThXoMDmiNFTsR9azO/ozaFHhkTXDv/274ObS+O5YDKC95Liz7OlBARAz OQJSCzXCVDkzhW7AbJsP0ZqzRO8iDMesuGhujMbjPYZHbB1HQy40eNYeb+JJdyXl/6UP QIlkM5naaAJFSo+6RWZay4oPRZlidYJIjhhg3Rg0qbVykhQvV3DDstA0HttM9beFae9f kSjLYSRF0A77MeDVLmBr/kNxDttRLYcgn6HCIl6XgWo8nGXwlH6XnFjzJW9X8rlqrbDy H98g== X-Gm-Message-State: AKwxytdoil5zOh6bNvVILF72WIt4ynXAZBzCGNSw8tP6LwiJPSmSMvTq hcwpNSkKM1ntHxCgyR9LG1x+N5EFxoySf0ODTQEirQ== X-Google-Smtp-Source: AH8x226y4xBmjuH6eh5KQJ6cz2P1hiSHWk1AlNKFNHM3IyBPL9F35vCVVVdq4pDMKBc7y7pfJoFSuz0aR0SBKu3XiJY= X-Received: by 10.80.205.140 with SMTP id p12mr38097201edi.169.1517040256946; Sat, 27 Jan 2018 00:04:16 -0800 (PST) MIME-Version: 1.0 Received: by 10.80.195.74 with HTTP; Sat, 27 Jan 2018 00:03:36 -0800 (PST) In-Reply-To: References: From: Byung-Gon Chun Date: Sat, 27 Jan 2018 17:03:36 +0900 Message-ID: Subject: Re: [PROPOSAL] Onyx - proposal for Apache Incubation To: general Content-Type: multipart/alternative; boundary="f403045dbd60a016240563bd75b5" --f403045dbd60a016240563bd75b5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Since we cannot use the name Onyx, we would like to change the project name to Surf. I hope that this name works. -Gon --- Byung-Gon Chun On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun wrote: > > > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci wrote: > >> Great work -- I think this technology has a lot of promise, and I'd love >> to >> see its evolution inside the Foundation. >> >> > Thanks, Davor! > > >> Parts of it, like the Onyx Intermediate Representation [1], overlap with >> the work-in-progress inside the Apache Beam project ("portability"). We'= d >> love to work together on this -- would you be open to such collaboration= ? >> If so, it may not be necessary to start from scratch, and leverage the >> work >> already done. >> >> > Sure. We're open to collaboration. > > >> Regarding the name, Onyx would likely have to be renamed, due to a >> conflict >> with a related technology [2]. >> >> > Thanks for pointing it out. It's difficult to come up with a good short > name. :) > Do you have any suggestion? > > Thanks! > -Gon > > --- > Byung-Gon Chun > > > >> Davor >> >> [1] https://snuspl.github.io/onyx/docs/ir/ >> [2] http://www.onyxplatform.org/ >> >> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun wrote= : >> >> > Dear Apache Incubator Community, >> > >> > Please accept the following proposal for presentation and discussion: >> > https://wiki.apache.org/incubator/OnyxProposal >> > >> > Onyx is a data processing system that aims to flexibly control the >> runtime >> > behaviors of a job to adapt to varying deployment characteristics (e.g= ., >> > harnessing transient resources in datacenters, cross-datacenter >> deployment, >> > changing runtime based on job characteristics, etc.). Onyx provides >> ways to >> > extend the system=E2=80=99s capabilities and incorporate the extension= s to the >> > flexible job execution. >> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into = an >> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys >> > based on a deployment policy. >> > >> > I've attached the proposal below. >> > >> > Best regards, >> > Byung-Gon Chun >> > >> > =3D OnyxProposal =3D >> > >> > =3D=3D Abstract =3D=3D >> > Onyx is a data processing system for flexible employment with >> > different execution scenarios for various deployment characteristics >> > on clusters. >> > >> > =3D=3D Proposal =3D=3D >> > Today, there is a wide variety of data processing systems with >> > different designs for better performance and datacenter efficiency. >> > They include processing data on specific resource environments and >> > running jobs with specific attributes. Although each system >> > successfully solves the problems it targets, most systems are designed >> > in the way that runtime behaviors are built tightly inside the system >> > core to hide the complexity of distributed computing. This makes it >> > hard for a single system to support different deployment >> > characteristics with different runtime behaviors without substantial >> > effort. >> > >> > Onyx is a data processing system that aims to flexibly control the >> > runtime behaviors of a job to adapt to varying deployment >> > characteristics. Moreover, it provides a means of extending the >> > system=E2=80=99s capabilities and incorporating the extensions to the = flexible >> > job execution. >> > >> > In order to be able to easily modify runtime behaviors to adapt to >> > varying deployment characteristics, Onyx exposes runtime behaviors to >> > be flexibly configured and modified at both compile-time and runtime >> > through a set of high-level graph pass interfaces. >> > >> > We hope to contribute to the big data processing community by enabling >> > more flexibility and extensibility in job executions. Furthermore, we >> > can benefit more together as a community when we work together as a >> > community to mature the system with more use cases and understanding >> > of diverse deployment characteristics. The Apache Software Foundation >> > is the perfect place to achieve these aspirations. >> > >> > =3D=3D Background =3D=3D >> > Many data processing systems have distinctive runtime behaviors >> > optimized and configured for specific deployment characteristics like >> > different resource environments and for handling special job >> > attributes. >> > >> > For example, much research have been conducted to overcome the >> > challenge of running data processing jobs on cheap, unreliable >> > transient resources. Likewise, techniques for disaggregating different >> > types of resources, like memory, CPU and GPU, are being actively >> > developed to use datacenter resources more efficiently. Many >> > researchers are also working to run data processing jobs in even more >> > diverse environments, such as across distant datacenters. Similarly, >> > for special job attributes, many works take different approaches, such >> > as runtime optimization, to solve problems like data skew, and to >> > optimize systems for data processing jobs with small-scale input data. >> > >> > Although each of the systems performs well with the jobs and in the >> > environments they target, they perform poorly with unconsidered cases, >> > and do not consider supporting multiple deployment characteristics on >> > a single system in their designs. >> > >> > For an application writer to optimize an application to perform well >> > on a certain system engraved with its underlying behaviors, it >> > requires a deep understanding of the system itself, which is an >> > overhead that often requires a lot of time and effort. Moreover, for a >> > developer to modify such system behaviors, it requires modifications >> > of the system core, which requires an even deeper understanding of the >> > system itself. >> > >> > With this background, Onyx is designed to represent all of its jobs as >> > an Intermediate Representation (IR) DAG. In the Onyx compiler, user >> > applications from various programming models (ex. Apache Beam) are >> > submitted, transformed to an IR DAG, and optimized/customized for the >> > deployment characteristics. In the IR DAG optimization phase, the DAG >> > is modified through a series of compiler =E2=80=9Cpasses=E2=80=9D whic= h reshape or >> > annotate the DAG with an expression of the underlying runtime >> > behaviors. The IR DAG is then submitted as an execution plan for the >> > Onyx runtime. The runtime includes the unmodified parts of data >> > processing in the backbone which is transparently integrated with >> > configurable components exposed for further extension. >> > >> > =3D=3D Rationale =3D=3D >> > Onyx=E2=80=99s vision lies in providing means for flexibly supporting = a wide >> > variety of job execution scenarios for users while facilitating system >> > developers to extend the execution framework with various >> > functionalities at the same time. The capabilities of the system can >> > be extended as it grows to meet a more variety of execution scenarios. >> > We require inputs from users and developers from diverse domains in >> > order to make it a more thriving and useful project. The Apache >> > Software Foundation provides the best tools and community to support >> > this vision. >> > >> > =3D=3D Initial Goals =3D=3D >> > Initial goals will be to move the existing codebase to Apache and >> > integrate with the Apache development process. We further plan to >> > develop our system to meet the needs for more execution scenarios for >> > a more variety of deployment characteristics. >> > >> > =3D=3D Current Status =3D=3D >> > Onyx codebase is currently hosted in a repository at github.com. The >> > current version has been developed by system developers at Seoul >> > National University, Viva Republica, Samsung, and LG. >> > >> > =3D=3D Meritocracy =3D=3D >> > We plan to strongly support meritocracy. We will discuss the >> > requirements in an open forum, and those that continuously contribute >> > to Onyx with the passion to strengthen the system will be invited as >> > committers. Contributors that enrich Onyx by providing various use >> > cases, various implementations of the configurable components >> > including ideas for optimization techniques will be especially >> > welcome. Committers with a deep understanding of the system=E2=80=99s >> > technical aspects as a whole and its philosophy will definitely be >> > voted as the PMC. We will monitor community participation so that >> > privileges can be extended to those that contribute. >> > >> > =3D=3D Community =3D=3D >> > We hope to expand our contribution community by becoming an Apache >> > incubator project. The contributions will come from both users and >> > system developers interested in flexibility and extensibility of job >> > executions that Onyx can support. We expect users to mainly contribute >> > to diversify the use cases and deployment characteristics, and >> > developers to contribute to implement them. >> > >> > =3D=3D Alignment =3D=3D >> > Apache Spark is one of many popular data processing frameworks. The >> > system is designed towards optimizing jobs using RDDs in memory and >> > many other optimizations built tightly within the framework. In >> > contrast to Spark, Onyx aims to provide more flexibility for job >> > execution in an easy manner. >> > >> > Apache Tez enables developers to build complex task DAGs with control >> > over the control plane of job execution. In Onyx, a high-level >> > programming layer (ex. Apache Beam) is automatically converted to a >> > basic IR DAG and can be converted to any IR DAG through a series of >> > easy user writable passes, that can both reshape and modify the >> > annotation (of execution properties) of the DAG. Moreover, Onyx leaves >> > more parts of the job execution configurable, such as the scheduler >> > and the data plane. As opposed to providing a set of properties for >> > solid optimization, Onyx=E2=80=99s configurable parts can be easily ex= tended >> > and explored by implementing the pre-defined interfaces. For example, >> > an arbitrary intermediate data store can be added. >> > >> > Onyx currently supports Apache Beam programs and we are working on >> > supporting Apache Spark programs as well. Onyx also utilizes Apache >> > REEF for container management, which allows Onyx to run in Apache YARN >> > and Apache Mesos clusters. If necessary, we plan to contribute to and >> > collaborate with these other Apache projects for the benefit of all. >> > We plan to extend such integrations with more Apache softwares. Apache >> > software foundation already hosts many major big-data systems, and we >> > expect to help further growth of the big-data community by having Onyx >> > within the Apache foundation. >> > >> > =3D=3D Known Risks =3D=3D >> > =3D=3D=3D Orphaned Products =3D=3D=3D >> > The risk of the Onyx project being orphaned is minimal. There is >> > already plenty of work that arduously support different deployment >> > characteristics, and we propose a general way to implement them with >> > flexible and extensible configuration knobs. The domain of data >> > processing is already of high interest, and this domain is expected to >> > evolve continuously with various other purposes, such as resource >> > disaggregation and using transient resources for better datacenter >> > resource utilization. >> > >> > =3D=3D=3D Inexperience with Open Source =3D=3D=3D >> > The initial committers include PMC members and committers of other >> > Apache projects. They have experience with open source projects, >> > starting from their incubation to the top-level. They have been >> > involved in the open source development process, and are familiar with >> > releasing code under an open source license. >> > >> > =3D=3D=3D Homogeneous Developers =3D=3D=3D >> > The initial set of committers is from a limited set of organizations, >> > but we expect to attract new contributors from diverse organizations >> > and will thus grow organically once approved for incubation. Our prior >> > experience with other open source projects will help various >> > contributors to actively participate in our project. >> > >> > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D >> > Many developers are from Seoul National University. This is not >> applicable. >> > >> > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D >> > Onyx positions itself among multiple Apache products. It runs on >> > Apache REEF for container management. It also utilizes many useful >> > development tools including Apache Maven, Apache Log4J, and multiple >> > Apache Commons components. Onyx supports the Apache Beam programming >> > model for user applications. We are currently working on supporting >> > the Apache Spark programming APIs as well. >> > >> > =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D >> > We hope to make Onyx a powerful system for data processing, meeting >> > various needs for different deployment characteristics, under a more >> > variety of environments. We see the limitations of simply putting code >> > on GitHub, and we believe the Apache community will help the growth of >> > Onyx for the project to become a positively impactful and innovative >> > open source software. We believe Onyx is a great fit for the Apache >> > Software Foundation due to the collaboration it aims to achieve from >> > the big data processing community. >> > >> > =3D=3D Documentation =3D=3D >> > The current documentation for Onyx is at https://snuspl.github.io/onyx= / >> . >> > >> > =3D=3D Initial Source =3D=3D >> > The Onyx codebase is currently hosted at https://github.com/snuspl/ony= x >> . >> > >> > =3D=3D External Dependencies =3D=3D >> > To the best of our knowledge, all Onyx dependencies are distributed >> > under Apache compatible licenses. Upon acceptance to the incubator, we >> > would begin a thorough analysis of all transitive dependencies to >> > verify this fact and further introduce license checking into the build >> > and release process. >> > >> > =3D=3D Cryptography =3D=3D >> > Not applicable. >> > >> > =3D=3D Required Resources =3D=3D >> > =3D=3D=3D Mailing Lists =3D=3D=3D >> > We will operate two mailing lists as follows: >> > * Onyx PMC discussions: private@onyx.incubator.apache.org >> > * Onyx developers: dev@onyx.incubator.apache.org >> > >> > =3D=3D=3D Git Repositories =3D=3D=3D >> > Upon incubation: https://github.com/apache/incubator-onyx. >> > After the incubation, we would like to move the existing repo >> > https://github.com/snuspl/onyx to the Apache infrastructure >> > >> > =3D=3D=3D Issue Tracking =3D=3D=3D >> > Onyx currently tracks its issues using the Github issue tracker: >> > https://github.com/snuspl/onyx/issues. We plan to migrate to Apache >> > JIRA. >> > >> > =3D=3D Initial Committers =3D=3D >> > * Byung-Gon Chun >> > * Jeongyoon Eo >> > * Geon-Woo Kim >> > * Joo Yeon Kim >> > * Gyewon Lee >> > * Jung-Gil Lee >> > * Sanha Lee >> > * Wooyeon Lee >> > * Yunseong Lee >> > * JangHo Seo >> > * Won Wook Song >> > * Taegeon Um >> > * Youngseok Yang >> > >> > =3D=3D Affiliations =3D=3D >> > * SNU (Seoul National University) >> > * Byung-Gon Chun >> > * Jeongyoon Eo >> > * Geon-Woo Kim >> > * Gyewon Lee >> > * Sanha Lee >> > * Wooyeon Lee >> > * Yunseong Lee >> > * JangHo Seo >> > * Won Wook Song >> > * Taegeon Um >> > * Youngseok Yang >> > >> > * LG >> > * Jung-Gil Lee >> > >> > * Samsung >> > * Joo Yeon Kim >> > >> > * Viva Republica >> > * Geon-Woo Kim >> > >> > =3D=3D Sponsors =3D=3D >> > =3D=3D=3D Champions =3D=3D=3D >> > Byung-Gon Chun >> > >> > =3D=3D=3D Mentors =3D=3D=3D >> > * Hyunsik Choi >> > * Byung-Gon Chun >> > * Markus Weimer >> > * Reynold Xin >> > >> > =3D=3D=3D Sponsoring Entity =3D=3D=3D >> > The Apache Incubator >> > >> > >> > >> > -- >> > Byung-Gon Chun >> > >> > > > > -- > Byung-Gon Chun > --=20 Byung-Gon Chun --f403045dbd60a016240563bd75b5--