Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 329C8200C86 for ; Wed, 31 May 2017 18:46:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 30AF9160BCB; Wed, 31 May 2017 16:46:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F0462160BC2 for ; Wed, 31 May 2017 18:46:44 +0200 (CEST) Received: (qmail 79496 invoked by uid 500); 31 May 2017 16:46:43 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 79484 invoked by uid 99); 31 May 2017 16:46:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 May 2017 16:46:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id EF726180314 for ; Wed, 31 May 2017 16:46:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.48 X-Spam-Level: X-Spam-Status: No, score=0.48 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_RED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id to2t9a_Xae-K for ; Wed, 31 May 2017 16:46:39 +0000 (UTC) Received: from mail-it0-f42.google.com (mail-it0-f42.google.com [209.85.214.42]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id CDD325FB77 for ; Wed, 31 May 2017 16:46:38 +0000 (UTC) Received: by mail-it0-f42.google.com with SMTP id r63so15005539itc.1 for ; Wed, 31 May 2017 09:46:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-transfer-encoding; bh=m8wRlzJ13CCQ3eAyoEsjk4aBebZO8Q3F7p+/GkHy89Q=; b=jT6jWfZftznJ4dFyNDTz17vkuUnCr3szvC5m3EeaU+PHMV/WBMaiNw0EFO+2c9aU/V Pt5+IdcXQc4TSoWF6C9RmtAzLKVfJ+EtCIJgXoOkZuGVRN5tdjo/Zux9HGdC3Fx9D+kW vUEhYCA5A54sRvbHQPygPETw4RDjkq1Yla9RbqtklNmlpyBXq7XUZVo0ZmQDDmIjujDq GJeRPD3nvJzkFYRh+J3Q8SiING4zwwk0PzB6fN46ibQCIpdB9wz3AYhWQf6Y9nxPr9Cn hzsTY7voJDn82ax990n3ubCMeEQECdkLXf71RCLNh8ZPiXNeFyIYvYpud59blWttbfam c+8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-transfer-encoding; bh=m8wRlzJ13CCQ3eAyoEsjk4aBebZO8Q3F7p+/GkHy89Q=; b=RgC9Vup7Vpbefyrll1GOQo78WS7/0jhxDkFOiDsrZbxPw7y01BTv7lZleqroxdC1aw +aai5oZNauZqfxaNVCcYn+w05xjfeEyq6ZduN+MP1HRFXydGf8r469b2BMk2Df9we09+ qC2bwat0vEelLJSUyE+MpEAJ89njRHcM2q1SLzpPnwaiE6MVvE/i/YPFxea+P0yVNh5r 8eeYGJsbU326z15DEqlh262KRBHJ55uOUDOakPmwtm5w6ibMzIzkyfVRbHfQbBi6zoEX eKB0KSeVwbXPaYJW2HIBZeRGON+hjSaSv+2rwlzm85plN1qUDpMuCRbQ8DUJLHrNNlKe tpLQ== X-Gm-Message-State: AODbwcAB/bMK6zpmYSWa43Z0W4qQ0/u2OLEHiL27ioipElFUVXsew/0m ZMbBUIq40G/9QdfJ0K1AXg4rq2kSlRsgyhw= X-Received: by 10.36.34.135 with SMTP id o129mr8485828ito.70.1496249191888; Wed, 31 May 2017 09:46:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.6.207 with HTTP; Wed, 31 May 2017 09:46:31 -0700 (PDT) In-Reply-To: References: From: Marcelo Vanzin Date: Wed, 31 May 2017 09:46:31 -0700 Message-ID: Subject: Re: [VOTE] Livy to enter Apache Incubator To: general@incubator.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable archived-at: Wed, 31 May 2017 16:46:46 -0000 +1 (non-binding) On Wed, May 31, 2017 at 6:03 AM, Sean Busbey wrote: > Hi folks! > > I'm calling a vote to accept "Livy" into the Apache Incubator. > > The full proposal is available below, and is also available in the wiki: > > https://wiki.apache.org/incubator/LivyProposal > > For additional context, please see the discussion thread: > > https://s.apache.org/incubator-livy-proposal-thread > > Please cast your vote: > > [ ] +1, bring Livy into Incubator > [ ] -1, do not bring Livy into Incubator, because... > > The vote will open at least for 72 hours and only votes from the Incubato= r > PMC are binding. > > I start with my vote: > +1 > > ---- > > =3D Abstract =3D > > Livy is web service that exposes a REST interface for managing long runni= ng > Apache Spark contexts in your cluster. With Livy, new applications can be > built on top of Apache Spark that require fine grained interaction with m= any > Spark contexts. > > =3D Proposal =3D > > Livy is an open-source REST service for Apache Spark. Livy enables > applications to submit Spark applications and retrieve results without a > co-location requirement on the Spark cluster. > > We propose to contribute the Livy codebase and associated artifacts (e.g. > documentation, web-site context etc) to the Apache Software Foundation. > > =3D Background =3D > > Apache Spark is a fast and general purpose distributed compute engine, wi= th > a versatile API. It enables processing of large quantities of static data > distributed over a cluster of machines, as well as processing of continuo= us > streams of data. It is the preferred distributed data processing engine f= or > data engineering, stream processing and data science workloads. Each Spar= k > application uses a construct called the SparkContext, which is the > application=E2=80=99s connection or entry point to the Spark engine. Each= Spark > application will have its own SparkContext. > > Livy enables clients to interact with one or more Spark sessions through = the > Livy Server, which acts as a proxy layer. Livy Clients have fine grained > control over the lifecycle of the Spark sessions, as well as the ability = to > submit jobs and retrieve results, all over HTTP. Clients have two modes o= f > interaction: RPC Client API, available in Java and Python, which allows > results to be retrieved as Java or Python objects. The serialization and > deserialization of the results is handled by the Livy framework. HTTP bas= ed > API that allows submission of code snippets, and retrieval of the results= in > different formats. > > Multi-tenant resource allocation and security: Livy enables multiple > independent Spark sessions to be managed simultaneously. Multiple clients > can also interact simultaneously with the same Spark session and share th= e > resources of that Spark session. Livy can also enforce secure, authentica= ted > communication between the clients and their respective Spark sessions. > > More information on Livy can be found at the existing open source website= : > http://livy.io/ > > =3D Rationale =3D > > Users want to use Spark=E2=80=99s powerful processing engine and API as t= he data > processing backend for interactive applications. However, the job submiss= ion > and application interaction mechanisms built into Apache Spark are > insufficient and cumbersome for multi-user interactive applications. > > The primary mechanism for applications to submit Spark jobs is via > spark-submit > (http://spark.apache.org/docs/latest/submitting-applications.html), which= is > available as a command line tool as well as a programmatic API. However, > spark-submit has the following limitations that make it difficult to buil= d > interactive applications: It is slow: each invocation of spark-submit > involves a setup phase where cluster resources are acquired, new processe= s > are forked, etc. This setup phase runs for many seconds, or even minutes, > and hence is too slow for interactive applications. It is cumbersome and > lacks flexibility: application code and dependencies have to be pre-compi= led > and submitted as jars, and can not be submitted interactively. > > Apache Spark comes with an ODBC/JDBC server, which can be used to submit = SQL > queries to Spark. However, this solution is limited to SQL and does not > allow the client to leverage the rest of the Spark API, such as RDDs, MLl= ib > and Streaming. > > A third way of using Spark is via its command-line shell, which allows th= e > interactive submission of snippets of Spark code. However, the shell enta= ils > running Spark code on the client machine and hence is not a viable mechan= ism > for remote clients to submit Spark jobs. > > Livy solves the limitations of the above three mechanisms, and provides t= he > full Spark API as a multi-tenant service to remote clients. > > Since the open source release of Livy in late 2015, we have seen tremendo= us > interest among a diverse set of application developers and ISVs that want= to > build applications with Apache Spark. To make Livy a robust and flexible > solution that will enable a broad and growing set of applications, it is > important to grow a large and varied community of contributors. > > =3D Initial Goals =3D > > * Move existing codebase, website, documentation and mailing lists to > Apache-hosted infrastructure > * Work with the infrastructure team to implement and approve our code > review, build, and testing workflows in the context of the ASF > * Incremental development and releases per Apache guidelines > > =3D Current Status =3D > > The Livy project began at Cloudera, as a part of the Hue project. Clouder= a > soon realized the broad applicability of Livy, and separated it out into = an > independent project in Nov 2015. > > =3D=3D Releases =3D=3D > > Livy has undergone two public releases, tagged here: > > * https://github.com/cloudera/livy/releases/tag/v0.2.0 > * https://github.com/cloudera/livy/releases/tag/v0.3.0 > > Tarballs and zip files were created for each release and hosted on github= . > Upon joining the incubator, we will adopt a more typical ASF release > process. > > =3D=3D Source =3D=3D > > Livy=E2=80=99s source is currently hosted on Github at: > https://github.com/cloudera/livy > > This repository will be transitioned to Apache=E2=80=99s git hosting duri= ng > incubation. > > =3D=3D Code review =3D=3D > > Livy=E2=80=99s code reviews are currently public and hosted on github as = pull > request reviews at: https://github.com/cloudera/livy/pulls > The Livy developer community so far is happy with github pull request > reviews and hopes to continue this after being admitted to the ASF. > > =3D=3D Issue Tracking =3D=3D > > Livy=E2=80=99s bug and feature tracking is hosted on JIRA at: > https://issues.cloudera.org/projects/LIVY/summary > This JIRA instance contains bugs and development discussion dating back 1 > year and will provide an initial seed for the ASF JIRA > > =3D=3D Community Discussion =3D=3D > > Livy has several public discussion forums: > > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user > > =3D=3D Development Practices =3D=3D > > The Livy project follows a review before commit philosophy. Every commit > automatically runs through the unit tests and generates coverage reports > presented as a pull request comment. Our experience with this process lea= ds > us to believe that it helps ease new contributors into the project. They = get > feedback quickly on common mistakes, lowering the burden on reviewers. Th= ose > same reviewers get to lead by example, showing the new contributors that = we > value feedback within our community even when changes are done by more > experienced folks. > > =3D=3D Meritocracy =3D=3D > > We believe strongly in meritocracy when electing committers and PMC membe= rs. > In the past few months, the project has added two new committers from two > different organisations, in recognition of their significant contribution= s > to the project. We will encourage contributions and participation of all > types, and ensure that contributors are appropriately recognized. > > =3D=3D Community =3D=3D > > Though Livy is relatively new as a standalone open source project, it has > already seen promising growth in its community across several organizatio= ns: > Cloudera is the original development sponsor for Livy > Microsoft pushed the development of the interpreter fixing high availabil= ity > issues and adding additional features. > Hortonworks has contributed the security features to Livy allowing kerber= os > and impersonation to work with Spark > IBM is starting to make contributions to the Livy project > A number of other patches contributed by community members > > Livy currently relies on Google Groups for mailing lists. These lists hav= e > been active since the end of 2015/start of 2016. Currently, Livy=E2=80=99= s user > mailing list has 173 subscribers and has hosted a total of 227 topic > threads. Livy=E2=80=99s developer list has 49 subscribers and has hosted = 79 topic > threads. > > =3D=3D Core Developers =3D=3D > > The early contributions to Livy were made by Cloudera engineers. In 2016, > engineers from Microsoft and Hortonworks joined the core developer > community. > > =3D=3D Alignment =3D=3D > > Livy is built upon Apache Spark, and other Apache projects like Apache > Hadoop YARN. It=E2=80=99s used as a building block by Apache Zeppelin. Th= ese > community connections combined with our focus on development practices th= at > emphasize community engagement with a path to meritocratic recognition > naturally align us with the ASF. > > =3D Known Risks =3D > > =3D=3D Orphaned Products =3D=3D > > The risk of Livy being abandoned is low because it is supported by three > major big-data software vendors. Moreover, Livy is already used to power > multiple releases of services and products used in production. > > =3D=3D Inexperience with Open Source =3D=3D > > Several of the initial committers are experienced open source developers, > several being committers and/or PMC members on other ASF projects (Spark, > YARN). > > =3D=3D Homogenous Developers =3D=3D > > The project already has a diverse developer base. It has contributions fr= om > 3 major organisations (Cloudera, Microsoft and Hortonworks), and is used = in > diverse applications, in diverse settings (On-Prem and Cloud). > > =3D=3D Reliance on salaried Developers =3D=3D > > The contributions to the Livy project to date have been made by salaried > engineers from Cloudera, Microsoft and Hortonworks. One of the individual= s > on the initial committer list has since left Microsoft and is currently > unaffiliated. The remaining contributors are from Cloudera and Hortonwork= s. > Since there are at least two major organizations involved, the risk of > reliance on a single group of salaried developers is mitigated. The Livy > user base is diverse, with users from across the globe, including users f= rom > academic settings. We aim to further diversify the Livy user and contribu= tor > base. > > =3D=3D Relationships with other Apache projects =3D=3D > > Livy is closely tied to the Apache Spark project and currently addresses = the > scenarios for a REST based batch and interactive gateway for Spark jobs o= n > YARN. Given the growing number of integrations with Livy, keeping it outs= ide > of Apache Spark aligns with the desire of the Apache Spark community to > reduce the number of external dependencies in the Spark project. > Specifically, the Apache Spark community has previously expressed a desir= e > to keep job servers independent from the project.< example, discussion of the Ooyala Spark Job Server in SPARK-818)>> > Furthermore, while Livy common usage is closely tied to Spark deployments > right now, its core building blocks can be reused elsewhere. Livy=E2=80= =99s Remote > REPL could be used as a library for interactive scenarios in non-Spark > projects. In the future, integrations with cluster managers like Apache > Mesos and others could also be added. > > The features provided by Livy have already been integrated with existing > projects like Jupyter and Apache Zeppelin for their interactive Spark use > cases. This validates the need for a project like Livy and provides an > active downstream user base that the Livy community can interact with to > seed future interest in the project. > > Livy serves a similar purpose to Apache Toree (incubating) but differs in > making session management, security and impersonation a focal design poin= t. > > =3D=3D An Excessive Fascination with the Apache Brand =3D=3D > > The primary motivation for submitting Livy to the ASF is to grow a divers= e > and strong community. We wish to encourage diverse organisations, includi= ng > ISVs, to adopt Livy and contribute to Livy without any concerns about > ownership or licensing. > > =3D Documentation =3D > > Documentation can be found on the Livy website http://livy.io/ > > The Livy web site is version controlled on the =E2=80=98gh-pages=E2=80=99= branch of the > above repository. > Additional documentation is provided on the github wiki: > https://github.com/cloudera/livy/wiki > APis are documented within the source code as JavaDoc style documentation > comments. > > =3D Initial Source =3D > > The initial source code for Livy is hosted at > https://github.com/cloudera/livy > > =3D Source and Intellectual Property submission plan =3D > > The Livy codebase and web site is currently hosted on GitHub and will be > transitioned to the ASF repositories during incubation. Livy is already > licensed under the Apache 2.0 license. Cloudera has collected ICLAs and > CCLAs from all committers. There are, however, some contributions recentl= y > from authors that have not signed the CCLA and ICLA. If necessary for a > successful SGA, we=E2=80=99ll seek the necessary documentation or replace= the > contributions. > > The =E2=80=9CLivy=E2=80=9D name is not a registered trademark. We will ne= ed to do a > trademark search and make sure it is available for the Apache Foundation > prior to graduation. > > Cloudera currently owns the domain name: http://livy.io/. Once all the > documentation has moved over to ASF infrastructure, the main landing page > will become livy.incubator.apache.org and the old domain will just act as= a > redirect. > > =3D External Dependencies =3D > > The list below covers the non-Apache dependencies of the project and thei= r > licenses. > > * Jetty: Apache 2.0 > * Dropwizard Metrics: Apache 2.0 > * FasterXML Jackson: Apache 2.0 > * Netty: Apache 2.0 > * Scala: BSD > * Py4J: BSD > * Scalatra: BSD > > Build/test-only dependencies: > > * Mockito: MIT > * JUnit: Eclipse > > =3D Required Resources =3D > > =3D=3D Mailing Lists =3D=3D > > * private@livy.incubator.apache.org (PPMC) > * dev@livy.incubator.apache.org (dev mailing list) > * user@livy.incubator.apache.org (User questions) > * commits@livy.incubator.apache.org (subscribers shouldn=E2=80=99t be ab= le to post) > * issues@livy.incubator.apache.org (subscribers shouldn=E2=80=99t be abl= e to post) > > =3D=3D Git Repository =3D=3D > > git://git.apache.org/incubator-livy > > =3D=3D Issue Tracking =3D=3D > > We would like to import our current JIRA project into the ASF JIRA, such > that our historical commit message and code comments continue to referenc= e > the appropriate bug numbers. > > =3D Initial Committers =3D > > * Marcelo Vanzin (vanzin@cloudera.com) > * Alex Man (alex@alexman.space) > * Jeff Zhang (zjffdu@gmail.com) > * Saisai Shao (sshao@hortonworks.com) > * Kostas Sakellis (kostas@cloudera.com) > > =3D Affiliations =3D > > The initial set of committers includes people employed by Cloudera and > Hortonworks as well as one currently independent contributor. > > =3D Additional Interested Contributors =3D > > Those interested in getting involved with the project as we enter incubat= ion > are encouraged to list themselves here. > > * Isma=C3=ABl Mej=C3=ADa (iemejia@apache.org) > > =3D Sponsors =3D > > =3D=3D Champion =3D=3D > > Sean Busbey (busbey@apache.org) > > =3D=3D Nominated Mentors =3D=3D > > * Bikas Saha (bikas@apache.org) > * Brock Noland (brock@phdata.io) > * Luciano Resende (lresende@apache.org) > > =3D=3D Sponsoring Entity =3D=3D > > We ask that the Incubator PMC sponsor this proposal. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > --=20 Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org