incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@apache.org>
Subject Re: [VOTE] Livy to enter Apache Incubator
Date Wed, 31 May 2017 19:13:33 GMT
+1 (binding)

On Wed, May 31, 2017 at 1:59 PM, Kostas Sakellis <kostas@cloudera.com>
wrote:

> +1 (non-binding)
>
> On Wed, May 31, 2017 at 11:46 AM, Andrew Purtell <andrew.purtell@gmail.com
> >
> wrote:
>
> > +1 (binding)
> >
> > > On May 31, 2017, at 6:03 AM, Sean Busbey <busbey@apache.org> wrote:
> > >
> > > Hi folks!
> > >
> > > I'm calling a vote to accept "Livy" into the Apache Incubator.
> > >
> > > The full proposal is available below, and is also available in the
> wiki:
> > >
> > > https://wiki.apache.org/incubator/LivyProposal
> > >
> > > For additional context, please see the discussion thread:
> > >
> > > https://s.apache.org/incubator-livy-proposal-thread
> > >
> > > Please cast your vote:
> > >
> > > [ ] +1, bring Livy into Incubator
> > > [ ] -1, do not bring Livy into Incubator, because...
> > >
> > > The vote will open at least for 72 hours and only votes from the
> > Incubator
> > > PMC are binding.
> > >
> > > I start with my vote:
> > > +1
> > >
> > > ----
> > >
> > > = Abstract =
> > >
> > > Livy is web service that exposes a REST interface for managing long
> > running
> > > Apache Spark contexts in your cluster. With Livy, new applications can
> be
> > > built on top of Apache Spark that require fine grained interaction with
> > many
> > > Spark contexts.
> > >
> > > = Proposal =
> > >
> > > Livy is an open-source REST service for Apache Spark. Livy enables
> > > applications to submit Spark applications and retrieve results without
> a
> > > co-location requirement on the Spark cluster.
> > >
> > > We propose to contribute the Livy codebase and associated artifacts
> (e.g.
> > > documentation, web-site context etc) to the Apache Software Foundation.
> > >
> > > = Background =
> > >
> > > Apache Spark is a fast and general purpose distributed compute engine,
> > with
> > > a versatile API. It enables processing of large quantities of static
> data
> > > distributed over a cluster of machines, as well as processing of
> > continuous
> > > streams of data. It is the preferred distributed data processing engine
> > for
> > > data engineering, stream processing and data science workloads. Each
> > Spark
> > > application uses a construct called the SparkContext, which is the
> > > application’s connection or entry point to the Spark engine. Each Spark
> > > application will have its own SparkContext.
> > >
> > > Livy enables clients to interact with one or more Spark sessions
> through
> > the
> > > Livy Server, which acts as a proxy layer. Livy Clients have fine
> grained
> > > control over the lifecycle of the Spark sessions, as well as the
> ability
> > to
> > > submit jobs and retrieve results, all over HTTP. Clients have two modes
> > of
> > > interaction: RPC Client API, available in Java and Python, which allows
> > > results to be retrieved as Java or Python objects. The serialization
> and
> > > deserialization of the results is handled by the Livy framework. HTTP
> > based
> > > API that allows submission of code snippets, and retrieval of the
> > results in
> > > different formats.
> > >
> > > Multi-tenant resource allocation and security: Livy enables multiple
> > > independent Spark sessions to be managed simultaneously. Multiple
> clients
> > > can also interact simultaneously with the same Spark session and share
> > the
> > > resources of that Spark session. Livy can also enforce secure,
> > authenticated
> > > communication between the clients and their respective Spark sessions.
> > >
> > > More information on Livy can be found at the existing open source
> > website:
> > > http://livy.io/
> > >
> > > = Rationale =
> > >
> > > Users want to use Spark’s powerful processing engine and API as the
> data
> > > processing backend for interactive applications. However, the job
> > submission
> > > and application interaction mechanisms built into Apache Spark are
> > > insufficient and cumbersome for multi-user interactive applications.
> > >
> > > The primary mechanism for applications to submit Spark jobs is via
> > > spark-submit
> > > (http://spark.apache.org/docs/latest/submitting-applications.html),
> > which is
> > > available as a command line tool as well as a programmatic API.
> However,
> > > spark-submit has the following limitations that make it difficult to
> > build
> > > interactive applications: It is slow: each invocation of spark-submit
> > > involves a setup phase where cluster resources are acquired, new
> > processes
> > > are forked, etc. This setup phase runs for many seconds, or even
> minutes,
> > > and hence is too slow for interactive applications. It is cumbersome
> and
> > > lacks flexibility: application code and dependencies have to be
> > pre-compiled
> > > and submitted as jars, and can not be submitted interactively.
> > >
> > > Apache Spark comes with an ODBC/JDBC server, which can be used to
> submit
> > SQL
> > > queries to Spark. However, this solution is limited to SQL and does not
> > > allow the client to leverage the rest of the Spark API, such as RDDs,
> > MLlib
> > > and Streaming.
> > >
> > > A third way of using Spark is via its command-line shell, which allows
> > the
> > > interactive submission of snippets of Spark code. However, the shell
> > entails
> > > running Spark code on the client machine and hence is not a viable
> > mechanism
> > > for remote clients to submit Spark jobs.
> > >
> > > Livy solves the limitations of the above three mechanisms, and provides
> > the
> > > full Spark API as a multi-tenant service to remote clients.
> > >
> > > Since the open source release of Livy in late 2015, we have seen
> > tremendous
> > > interest among a diverse set of application developers and ISVs that
> > want to
> > > build applications with Apache Spark. To make Livy a robust and
> flexible
> > > solution that will enable a broad and growing set of applications, it
> is
> > > important to grow a large and varied community of contributors.
> > >
> > > = Initial Goals =
> > >
> > >  * Move existing codebase, website, documentation and mailing lists to
> > >    Apache-hosted infrastructure
> > >  * Work with the infrastructure team to implement and approve our code
> > >    review, build, and testing workflows in the context of the ASF
> > >  * Incremental development and releases per Apache guidelines
> > >
> > > = Current Status =
> > >
> > > The Livy project began at Cloudera, as a part of the Hue project.
> > Cloudera
> > > soon realized the broad applicability of Livy, and separated it out
> into
> > an
> > > independent project in Nov 2015.
> > >
> > > == Releases ==
> > >
> > > Livy has undergone two public releases, tagged here:
> > >
> > > * https://github.com/cloudera/livy/releases/tag/v0.2.0
> > > * https://github.com/cloudera/livy/releases/tag/v0.3.0
> > >
> > > Tarballs and zip files were created for each release and hosted on
> > github.
> > > Upon joining the incubator, we will adopt a more typical ASF release
> > > process.
> > >
> > > == Source ==
> > >
> > > Livy’s source is currently hosted on Github at:
> > > https://github.com/cloudera/livy
> > >
> > > This repository will be transitioned to Apache’s git hosting during
> > > incubation.
> > >
> > > == Code review ==
> > >
> > > Livy’s code reviews are currently public and hosted on github as pull
> > > request reviews at: https://github.com/cloudera/livy/pulls
> > > The Livy developer community so far is happy with github pull request
> > > reviews and hopes to continue this after being admitted to the ASF.
> > >
> > > == Issue Tracking ==
> > >
> > > Livy’s bug and feature tracking is hosted on JIRA at:
> > > https://issues.cloudera.org/projects/LIVY/summary
> > > This JIRA instance contains bugs and development discussion dating
> back 1
> > > year and will provide an initial seed for the ASF JIRA
> > >
> > > == Community Discussion ==
> > >
> > > Livy has several public discussion forums:
> > >
> > > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev
> > > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user
> > >
> > > == Development Practices ==
> > >
> > > The Livy project follows a review before commit philosophy. Every
> commit
> > > automatically runs through the unit tests and generates coverage
> reports
> > > presented as a pull request comment. Our experience with this process
> > leads
> > > us to believe that it helps ease new contributors into the project.
> They
> > get
> > > feedback quickly on common mistakes, lowering the burden on reviewers.
> > Those
> > > same reviewers get to lead by example, showing the new contributors
> that
> > we
> > > value feedback within our community even when changes are done by more
> > > experienced folks.
> > >
> > > == Meritocracy ==
> > >
> > > We believe strongly in meritocracy when electing committers and PMC
> > members.
> > > In the past few months, the project has added two new committers from
> two
> > > different organisations, in recognition of their significant
> > contributions
> > > to the project. We will encourage contributions and participation of
> all
> > > types, and ensure that contributors are appropriately recognized.
> > >
> > > == Community ==
> > >
> > > Though Livy is relatively new as a standalone open source project, it
> has
> > > already seen promising growth in its community across several
> > organizations:
> > > Cloudera is the original development sponsor for Livy
> > > Microsoft pushed the development of the interpreter fixing high
> > availability
> > > issues and adding additional features.
> > > Hortonworks has contributed the security features to Livy allowing
> > kerberos
> > > and impersonation to work with Spark
> > > IBM is starting to make contributions to the Livy project
> > > A number of other patches contributed by community members
> > >
> > > Livy currently relies on Google Groups for mailing lists. These lists
> > have
> > > been active since the end of 2015/start of 2016. Currently, Livy’s user
> > > mailing list has 173 subscribers and has hosted a total of 227 topic
> > > threads. Livy’s developer list has 49 subscribers and has hosted 79
> topic
> > > threads.
> > >
> > > == Core Developers ==
> > >
> > > The early contributions to Livy were made by Cloudera engineers. In
> 2016,
> > > engineers from Microsoft and Hortonworks joined the core developer
> > > community.
> > >
> > > == Alignment ==
> > >
> > > Livy is built upon Apache Spark, and other Apache projects like Apache
> > > Hadoop YARN. It’s used as a building block by Apache Zeppelin. These
> > > community connections combined with our focus on development practices
> > that
> > > emphasize community engagement with a path to meritocratic recognition
> > > naturally align us with the ASF.
> > >
> > > = Known Risks =
> > >
> > > == Orphaned Products ==
> > >
> > > The risk of Livy being abandoned is low because it is supported by
> three
> > > major big-data software vendors. Moreover, Livy is already used to
> power
> > > multiple releases of services and products used in production.
> > >
> > > == Inexperience with Open Source ==
> > >
> > > Several of the initial committers are experienced open source
> developers,
> > > several being committers and/or PMC members on other ASF projects
> (Spark,
> > > YARN).
> > >
> > > == Homogenous Developers ==
> > >
> > > The project already has a diverse developer base. It has contributions
> > from
> > > 3 major organisations (Cloudera, Microsoft and Hortonworks), and is
> used
> > in
> > > diverse applications, in diverse settings (On-Prem and Cloud).
> > >
> > > == Reliance on salaried Developers ==
> > >
> > > The contributions to the Livy project to date have been made by
> salaried
> > > engineers from Cloudera, Microsoft and Hortonworks. One of the
> > individuals
> > > on the initial committer list has since left Microsoft and is currently
> > > unaffiliated. The remaining contributors are from Cloudera and
> > Hortonworks.
> > > Since there are at least two major organizations involved, the risk of
> > > reliance on a single group of salaried developers is mitigated. The
> Livy
> > > user base is diverse, with users from across the globe, including users
> > from
> > > academic settings. We aim to further diversify the Livy user and
> > contributor
> > > base.
> > >
> > > == Relationships with other Apache projects ==
> > >
> > > Livy is closely tied to the Apache Spark project and currently
> addresses
> > the
> > > scenarios for a REST based batch and interactive gateway for Spark jobs
> > on
> > > YARN. Given the growing number of integrations with Livy, keeping it
> > outside
> > > of Apache Spark aligns with the desire of the Apache Spark community to
> > > reduce the number of external dependencies in the Spark project.
> > > Specifically, the Apache Spark community has previously expressed a
> > desire
> > > to keep job servers independent from the project.<<FootNote(See, for
> > > example, discussion of the Ooyala Spark Job Server in SPARK-818)>>
> > > Furthermore, while Livy common usage is closely tied to Spark
> deployments
> > > right now, its core building blocks can be reused elsewhere.  Livy’s
> > Remote
> > > REPL could be used as a library for interactive scenarios in non-Spark
> > > projects. In the future, integrations with cluster managers like Apache
> > > Mesos and others could also be added.
> > >
> > > The features provided by Livy have already been integrated with
> existing
> > > projects like Jupyter and Apache Zeppelin for their interactive Spark
> use
> > > cases. This validates the need for a project like Livy and provides an
> > > active downstream user base that the Livy community can interact with
> to
> > > seed future interest in the project.
> > >
> > > Livy serves a similar purpose to Apache Toree (incubating) but differs
> in
> > > making session management, security and impersonation a focal design
> > point.
> > >
> > > == An Excessive Fascination with the Apache Brand ==
> > >
> > > The primary motivation for submitting Livy to the ASF is to grow a
> > diverse
> > > and strong community. We wish to encourage diverse organisations,
> > including
> > > ISVs, to adopt Livy and contribute to Livy without any concerns about
> > > ownership or licensing.
> > >
> > > = Documentation =
> > >
> > > Documentation can be found on the Livy website http://livy.io/
> > >
> > > The Livy web site is version controlled on the ‘gh-pages’ branch of the
> > > above repository.
> > > Additional documentation is provided on the github wiki:
> > > https://github.com/cloudera/livy/wiki
> > > APis are documented within the source code as JavaDoc style
> documentation
> > > comments.
> > >
> > > = Initial Source =
> > >
> > > The initial source code for Livy is hosted at
> > > https://github.com/cloudera/livy
> > >
> > > = Source and Intellectual Property submission plan =
> > >
> > > The Livy codebase and web site is currently hosted on GitHub and will
> be
> > > transitioned to the ASF repositories during incubation. Livy is already
> > > licensed under the Apache 2.0 license. Cloudera has collected ICLAs and
> > > CCLAs from all committers. There are, however, some contributions
> > recently
> > > from authors that have not signed the CCLA and ICLA. If necessary for a
> > > successful SGA, we’ll seek the necessary documentation or replace the
> > > contributions.
> > >
> > > The “Livy” name is not a registered trademark. We will need to do a
> > > trademark search and make sure it is available for the Apache
> Foundation
> > > prior to graduation.
> > >
> > > Cloudera currently owns the domain name: http://livy.io/. Once all the
> > > documentation has moved over to ASF infrastructure, the main landing
> page
> > > will become livy.incubator.apache.org and the old domain will just act
> > as a
> > > redirect.
> > >
> > > = External Dependencies =
> > >
> > > The list below covers the non-Apache dependencies of the project and
> > their
> > > licenses.
> > >
> > > * Jetty: Apache 2.0
> > > * Dropwizard Metrics: Apache 2.0
> > > * FasterXML Jackson: Apache 2.0
> > > * Netty: Apache 2.0
> > > * Scala: BSD
> > > * Py4J: BSD
> > > * Scalatra: BSD
> > >
> > > Build/test-only dependencies:
> > >
> > > * Mockito: MIT
> > > * JUnit: Eclipse
> > >
> > > = Required Resources =
> > >
> > > == Mailing Lists ==
> > >
> > > * private@livy.incubator.apache.org (PPMC)
> > > * dev@livy.incubator.apache.org (dev mailing list)
> > > * user@livy.incubator.apache.org (User questions)
> > > * commits@livy.incubator.apache.org (subscribers shouldn’t be able to
> > post)
> > > * issues@livy.incubator.apache.org (subscribers shouldn’t be able to
> > post)
> > >
> > > == Git Repository ==
> > >
> > > git://git.apache.org/incubator-livy
> > >
> > > == Issue Tracking ==
> > >
> > > We would like to import our current JIRA project into the ASF JIRA,
> such
> > > that our historical commit message and code comments continue to
> > reference
> > > the appropriate bug numbers.
> > >
> > > = Initial Committers =
> > >
> > > * Marcelo Vanzin (vanzin@cloudera.com)
> > > * Alex Man (alex@alexman.space)
> > > * Jeff Zhang (zjffdu@gmail.com)
> > > * Saisai Shao (sshao@hortonworks.com)
> > > * Kostas Sakellis (kostas@cloudera.com)
> > >
> > > = Affiliations =
> > >
> > > The initial set of committers includes people employed by Cloudera and
> > > Hortonworks as well as one currently independent contributor.
> > >
> > > = Additional Interested Contributors =
> > >
> > > Those interested in getting involved with the project as we enter
> > incubation
> > > are encouraged to list themselves here.
> > >
> > >  * Ismaël Mejía (iemejia@apache.org)
> > >
> > > = Sponsors =
> > >
> > > == Champion ==
> > >
> > > Sean Busbey (busbey@apache.org)
> > >
> > > == Nominated Mentors ==
> > >
> > > * Bikas Saha (bikas@apache.org)
> > > * Brock Noland (brock@phdata.io)
> > > * Luciano Resende (lresende@apache.org)
> > >
> > > == Sponsoring Entity ==
> > >
> > > We ask that the Incubator PMC sponsor this proposal.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: general-help@incubator.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message