incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@apache.org>
Subject Re: [PROPOSAL] Livy Proposal for Apache Incubator
Date Sun, 21 May 2017 14:22:56 GMT
Great to see!

+1

On Fri, May 19, 2017 at 7:24 PM, William GUO <guoiii@outlook.com> wrote:

> +1
>
> Griffin needs Livy to access Spark context.
>
>
> Thanks,
> William
>
> On 5/20/17, 7:45 AM, "Sean Busbey" <busbey@apache.org> wrote:
>
>     Dear Apache Incubator Community,
>
>     I'm excited to present for discussion a proposal to move Livy into
>     incubation. Livy is web service that exposes a REST interface for
> managing
>     long running Apache Spark contexts in your cluster. With Livy, new
>     applications can be built on top of Apache Spark that require fine
> grained
>     interaction with many Spark contexts.
>
>     The proposal is on the wiki at the following page as well as copied in
> the
>     email below:
>
>     https://wiki.apache.org/incubator/LivyProposal
>
>     In addition to welcoming feedback on the proposal, we are actively
> seeking
>     one or more additional mentors. We also have included a section for
>     interested folks to ensure they get added to the mailing lists,
> presuming
>     Livy gets accepted for incubation.
>
>     ---- LivyProposal
>
>     = Abstract =
>
>     Livy is web service that exposes a REST interface for managing
>     long running Apache Spark contexts in your cluster. With Livy, new
>     applications can be built on top of Apache Spark that require fine
> grained
>     interaction with many Spark contexts.
>
>     = Proposal =
>
>     Livy is an open-source REST service for Apache Spark. Livy
>     enables applications to submit Spark applications and retrieve results
>     without a co-location requirement on the Spark cluster.
>
>     We propose to contribute the Livy codebase and associated artifacts
> (e.g.
>     documentation, web-site context etc) to the Apache Software Foundation.
>
>     = Background =
>
>     Apache Spark is a fast and general purpose distributed
>     compute engine, with a versatile API. It enables processing of large
>     quantities of static data distributed over a cluster of machines, as
> well as
>     processing of continuous streams of data. It is the preferred
> distributed
>     data processing engine for data engineering, stream processing and data
>     science workloads. Each Spark application uses a construct called the
>     SparkContext, which is the application’s connection or entry point
> to the
>     Spark engine. Each Spark application will have its own SparkContext.
>
>     Livy enables clients to interact with one or more Spark sessions
> through the
>     Livy Server, which acts as a proxy layer. Livy Clients have fine
> grained
>     control over the lifecycle of the Spark sessions, as well as the
> ability to
>     submit jobs and retrieve results, all over HTTP.  Clients have two
> modes of
>     interaction: RPC Client API, available in Java and Python, which allows
>     results to be retrieved as Java or Python objects. The serialization
> and
>     deserialization of the results is handled by the Livy framework.  HTTP
> based
>     API that allows submission of code snippets, and retrieval of the
> results in
>     different formats.
>
>     Multi-tenant resource allocation and security: Livy enables multiple
>     independent Spark sessions to be managed simultaneously. Multiple
> clients
>     can also interact simultaneously with the same Spark session and share
> the
>     resources of that Spark session. Livy can also enforce secure,
> authenticated
>     communication between the clients and their respective Spark sessions.
>
>     More information on Livy can be found at the existing open source
> website:
>     http://livy.io/
>
>     = Rationale =
>
>     Users want to use Spark’s powerful processing engine and API
>     as the data processing backend for interactive applications. However,
> the
>     job submission and application interaction mechanisms built into Apache
>     Spark are insufficient and cumbersome for multi-user interactive
>     applications.
>
>     The primary mechanism for applications to submit Spark jobs is via
>     spark-submit
>     (http://spark.apache.org/docs/latest/submitting-applications.html),
> which is
>     available as a command line tool as well as a programmatic API.
> However,
>     spark-submit has the following limitations that make it difficult to
> build
>     interactive applications: It is slow: each invocation of spark-submit
>     involves a setup phase where cluster resources are acquired, new
> processes
>     are forked, etc. This setup phase runs for many seconds, or even
> minutes,
>     and hence is too slow for interactive applications.  It is cumbersome
> and
>     lacks flexibility: application code and dependencies have to be
> pre-compiled
>     and submitted as jars, and can not be submitted interactively.
>
>     Apache Spark comes with an ODBC/JDBC server, which can be used to
> submit SQL
>     queries to Spark. However, this solution is limited to SQL and does not
>     allow the client to leverage the rest of the Spark API, such as RDDs,
> MLlib
>     and Streaming.
>
>     A third way of using Spark is via its command-line shell, which allows
> the
>     interactive submission of snippets of Spark code. However, the shell
> entails
>     running Spark code on the client machine and hence is not a viable
> mechanism
>     for remote clients to submit Spark jobs.
>
>     Livy solves the limitations of the above three mechanisms, and
> provides the
>     full Spark API as a multi-tenant service to remote clients.
>
>     Since the open source release of Livy in late 2015, we have seen
> tremendous
>     interest among a diverse set of application developers and ISVs that
> want to
>     build applications with Apache Spark. To make Livy a robust and
> flexible
>     solution that will enable a broad and growing set of applications, it
> is
>     important to grow a large and varied community of contributors.
>
>     = Initial Goals =
>
>     Move existing codebase, website, documentation and mailing
>     lists to Apache-hosted infrastructure Work with the infrastructure
> team to
>     implement and approve our code review, build, and testing workflows in
> the
>     context of the ASF Incremental development and releases per Apache
>     guidelines
>
>     = Current Status =
>
>     The Livy project began at Cloudera, as a part of the Hue
>     project. Cloudera soon realized the broad applicability of Livy, and
>     separated it out into an independent project in Nov 2015.
>
>     == Releases ==
>
>     Livy has undergone two public releases, tagged here:
>
>     * https://github.com/cloudera/livy/releases/tag/v0.2.0
>     * https://github.com/cloudera/livy/releases/tag/v0.3.0
>
>     Tarballs and zip files were created for each release and hosted on
> github.
>     Upon joining the incubator, we will adopt a more typical ASF release
>     process.
>
>     == Source ==
>
>     Livy’s source is currently hosted on Github at:
>
>     https://github.com/cloudera/livy
>
>     This repository will be transitioned to Apache’s git hosting during
>     incubation.
>
>     == Code review ==
>
>     Livy’s code reviews are currently public and hosted on
>     github as pull request reviews at: https://github.com/cloudera/
> livy/pulls
>     The Livy developer community so far is happy with github pull request
>     reviews and hopes to continue this after being admitted to the ASF.
>
>     == Issue Tracking ==
>
>     Livy’s bug and feature tracking is hosted on JIRA at:
>     https://issues.cloudera.org/projects/LIVY/summary This JIRA instance
>     contains bugs and development discussion dating back 1 year and will
> provide
>     an initial seed for the ASF JIRA
>
>     == Community Discussion ==
>
>     Livy has several public discussion forums:
>
>     * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev
>     * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user
>
>     == Development Practices ==
>
>     The Livy project follows a review before commit philosophy. Every
> commit
>     automatically runs through the unit tests and generates coverage
> reports
>     presented as a pull request comment. Our experience with this process
> leads
>     us to believe that it helps ease new contributors into the project.
> They get
>     feedback quickly on common mistakes, lowering the burden on reviewers.
> Those
>     same reviewers get to lead by example, showing the new contributors
> that we
>     value feedback within our community even when changes are done by more
>     experienced folks.
>
>     == Meritocracy ==
>
>     We believe strongly in meritocracy when electing committers and PMC
> members.
>     In the past few months, the project has added two new committers from
> two
>     different organisations, in recognition of their significant
> contributions
>     to the project. We will encourage contributions and participation of
> all
>     types, and ensure that contributors are appropriately recognized.
>
>     == Community ==
>
>     Though Livy is relatively new as a standalone open source project, it
> has
>     already seen promising growth in its community across several
> organizations:
>     Cloudera is the original development sponsor for Livy Microsoft pushed
> the
>     development of the interpreter fixing high availability issues and
> adding
>     additional features.  Hortonworks has contributed the security
> features to
>     Livy allowing kerberos and impersonation to work with Spark IBM is
> starting
>     to make contributions to the Livy project A number of other patches
>     contributed by community members
>
>     Livy currently relies on Google Groups for mailing lists. These lists
> have
>     been active since the end of 2015/start of 2016. Currently, Livy’s
> user
>     mailing list has 173 subscribers and has hosted a total of 227 topic
>     threads. Livy’s developer list has 49 subscribers and has hosted 79
> topic
>     threads.
>
>     == Core Developers ==
>
>     The early contributions to Livy were made by Cloudera engineers. In
> 2016,
>     engineers from Microsoft and Hortonworks joined the core developer
>     community.
>
>     == Alignment ==
>
>     Livy is built upon Apache Spark, and other Apache projects like Apache
>     Hadoop YARN. It’s used as a building block by Apache Zeppelin.  These
>     community connections combined with our focus on development practices
> that
>     emphasize community engagement with a path to meritocratic recognition
>     naturally align us with the ASF.
>
>     = Known Risks =
>     == Orphaned Products ==
>
>     The risk of Livy being abandoned is low because it is supported by
> three
>     major big-data software vendors.  Moreover, Livy is already used to
> power
>     multiple releases of services and products used in production.
>
>     == Inexperience with Open Source ==
>
>     Several of the initial committers are experienced open source
> developers,
>     several being committers and/or PMC members on other ASF projects
> (Spark,
>     YARN).
>
>     == Homogenous Developers ==
>
>     The project already has a diverse developer base. It has contributions
> from
>     3 major organisations (Cloudera, Microsoft and Hortonworks), and is
> used in
>     diverse applications, in diverse settings (On-Prem and Cloud).
>
>     == Reliance on salaried Developers ==
>
>     The existing contributors to the Livy project have been made by
> salaried
>     engineers from Cloudera, Microsoft and Hortonworks. Since there are
> three
>     major organisations involved, the risk of reliance on a single group of
>     salaried developers is mitigated. The Livy user base is diverse, with
> users
>     from across the globe, including users from academic settings. We aim
> to
>     further diversify the Livy user and contributor base.
>
>     == Relationships with other Apache projects ==
>
>     Livy is closely tied to the Apache Spark project and currently
> addresses the
>     scenarios for a REST based batch and interactive gateway for Spark
> jobs on
>     YARN. Given the growing number of integrations with Livy, keeping it
> outside
>     of Apache Spark aligns with the desire of the Apache Spark community to
>     reduce the number of external dependencies in the Spark project.
>     Specifically, the Apache Spark community has previously expressed a
> desire
>     to keep job servers independent from the project.<<FootNote(See, for
>     example, discussion of the Ooyala Spark Job Server in SPARK-818)>>
>     Furthermore, while Livy common usage is closely tied to Spark
> deployments
>     right now, its core building blocks can be reused elsewhere.  Livy’s
> Remote
>     REPL could be used as a library for interactive scenarios in non-Spark
>     projects. In the future, integrations with cluster managers like Apache
>     Mesos and others could also be added.
>
>     The features provided by Livy have already been integrated with
> existing
>     projects like Jupyter and Apache Zeppelin for their interactive Spark
> use
>     cases. This validates the need for a project like Livy and provides an
>     active downstream user base that the Livy community can interact with
> to
>     seed future interest in the project.
>
>     Livy serves a similar purpose to Apache Toree (incubating) but differs
> in
>     making session management, security and impersonation a focal design
> point.
>
>     == An Excessive Fascination with the Apache Brand ==
>
>     The primary motivation for submitting Livy to the ASF is to grow a
> diverse
>     and strong community. We wish to encourage diverse organisations,
> including
>     ISVs, to adopt Livy and contribute to Livy without any concerns about
>     ownership or licensing.
>
>     = Documentation =
>
>     Documentation can be found on the Livy website http://livy.io/ The
> Livy web
>     site is version controlled on the ‘gh-pages’ branch of the above
> repository
>     Additional documentation is provided on the github wiki:
>     https://github.com/cloudera/livy/wiki APis are documented within the
> source
>     code as JavaDoc style documentation comments.
>
>     = Initial Source =
>
>     The initial source code for Livy is hosted at
>
>     https://github.com/cloudera/livy
>
>     = Source and Intellectual Property submission plan =
>
>     The Livy codebase and web site is currently hosted on GitHub and will
> be
>     transitioned to the ASF repositories during incubation. Livy is already
>     licensed under the Apache 2.0 license. Cloudera has collected ICLAs and
>     CCLAs from all committers.  There are, however, some contributions
> recently
>     from authors that have not signed the CCLA and ICLA. If necessary for a
>     successful SGA, we’ll seek the necessary documentation or replace the
>     contributions.
>
>     The “Livy† name is not a registered trademark. We will need to do a
>     trademark search and make sure it is available for the Apache
> Foundation
>     prior to graduation.
>
>     Cloudera currently owns the domain name: http://livy.io/ which will be
>     transferred to the ASF and redirected to the official page during
>     incubation.
>
>     = External Dependencies =
>
>     The list below covers the non-Apache dependencies of the project and
> their
>     licenses.
>
>      * Jetty: Apache 2.0
>      * Dropwizard Metrics: Apache 2.0
>      * FasterXML Jackson: Apache 2.0
>      * Netty: Apache 2.0
>      * Scala: BSD
>      * Py4J: BSD
>      * Scalatra: BSD
>
>     Build/test-only dependencies:
>
>      * Mockito: MIT
>      * JUnit: Eclipse
>
>     = Required Resources =
>     == Mailing Lists ==
>
>      * private@livy.incubator.apache.org (PPMC)
>      * dev@livy.incubator.apache.org (dev mailing list)
>      * user@livy.incubator.apache.org (User questions)
>      * commits@livy.incubator.apache.org (subscribers shouldn’t be able
> to post)
>      * issues@livy.incubator.apache.org (subscribers shouldn’t be able
> to post)
>
>     == Git Repository ==
>
>     git://git.apache.org/livy
>
>     == Issue Tracking ==
>
>     We would like to import our current JIRA project into the ASF JIRA,
> such
>     that our historical commit message and code comments continue to
> reference
>     the appropriate bug numbers.
>
>     = Initial Committers =
>
>      * Marcelo Vanzin (vanzin@cloudera.com)
>      * Alex Man (alex@alexman.space)
>      * Jeff Zhang (zjffdu@gmail.com)
>      * Saisai Shao (sshao@hortonworks.com)
>      * Kostas Sakellis (kostas@cloudera.com)
>
>     = Affiliations =
>
>     The initial set of committers includes people employed by Cloudera and
>     Hortonworks as well as one person currently unaffiliated with an
> employer.
>
>     = Additional Interested Contributors =
>
>     Those interested in getting involved with the project as we enter
> incubation
>     are encourage to list themselves here.
>
>      * < add here >
>
>     = Sponsors =
>     == Champion ==
>
>      * Sean Busbey (busbey@apache.org)
>
>     == Nominated Mentors ==
>
>      * Bikas Saha (bikas@apache.org)
>      * Brock Noland (brock@phdata.io)
>
>     == Sponsoring Entity ==
>
>     We ask that the Incubator PMC sponsor this proposal.
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>     For additional commands, e-mail: general-help@incubator.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message