incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: [VOTE] Livy to enter Apache Incubator
Date Wed, 31 May 2017 15:54:44 GMT
+1 (binding)

If you need an additional mentor, please let me know, I'm interested by the 
project !

Regards
JB

On 05/31/2017 03:03 PM, Sean Busbey wrote:
> Hi folks!
> 
> I'm calling a vote to accept "Livy" into the Apache Incubator.
> 
> The full proposal is available below, and is also available in the wiki:
> 
> https://wiki.apache.org/incubator/LivyProposal
> 
> For additional context, please see the discussion thread:
> 
> https://s.apache.org/incubator-livy-proposal-thread
> 
> Please cast your vote:
> 
> [ ] +1, bring Livy into Incubator
> [ ] -1, do not bring Livy into Incubator, because...
> 
> The vote will open at least for 72 hours and only votes from the Incubator
> PMC are binding.
> 
> I start with my vote:
> +1
> 
> ----
> 
> = Abstract =
> 
> Livy is web service that exposes a REST interface for managing long running
> Apache Spark contexts in your cluster. With Livy, new applications can be
> built on top of Apache Spark that require fine grained interaction with many
> Spark contexts.
> 
> = Proposal =
> 
> Livy is an open-source REST service for Apache Spark. Livy enables
> applications to submit Spark applications and retrieve results without a
> co-location requirement on the Spark cluster.
> 
> We propose to contribute the Livy codebase and associated artifacts (e.g.
> documentation, web-site context etc) to the Apache Software Foundation.
> 
> = Background =
> 
> Apache Spark is a fast and general purpose distributed compute engine, with
> a versatile API. It enables processing of large quantities of static data
> distributed over a cluster of machines, as well as processing of continuous
> streams of data. It is the preferred distributed data processing engine for
> data engineering, stream processing and data science workloads. Each Spark
> application uses a construct called the SparkContext, which is the
> application’s connection or entry point to the Spark engine. Each Spark
> application will have its own SparkContext.
> 
> Livy enables clients to interact with one or more Spark sessions through the
> Livy Server, which acts as a proxy layer. Livy Clients have fine grained
> control over the lifecycle of the Spark sessions, as well as the ability to
> submit jobs and retrieve results, all over HTTP. Clients have two modes of
> interaction: RPC Client API, available in Java and Python, which allows
> results to be retrieved as Java or Python objects. The serialization and
> deserialization of the results is handled by the Livy framework. HTTP based
> API that allows submission of code snippets, and retrieval of the results in
> different formats.
> 
> Multi-tenant resource allocation and security: Livy enables multiple
> independent Spark sessions to be managed simultaneously. Multiple clients
> can also interact simultaneously with the same Spark session and share the
> resources of that Spark session. Livy can also enforce secure, authenticated
> communication between the clients and their respective Spark sessions.
> 
> More information on Livy can be found at the existing open source website:
> http://livy.io/
> 
> = Rationale =
> 
> Users want to use Spark’s powerful processing engine and API as the data
> processing backend for interactive applications. However, the job submission
> and application interaction mechanisms built into Apache Spark are
> insufficient and cumbersome for multi-user interactive applications.
> 
> The primary mechanism for applications to submit Spark jobs is via
> spark-submit
> (http://spark.apache.org/docs/latest/submitting-applications.html), which is
> available as a command line tool as well as a programmatic API. However,
> spark-submit has the following limitations that make it difficult to build
> interactive applications: It is slow: each invocation of spark-submit
> involves a setup phase where cluster resources are acquired, new processes
> are forked, etc. This setup phase runs for many seconds, or even minutes,
> and hence is too slow for interactive applications. It is cumbersome and
> lacks flexibility: application code and dependencies have to be pre-compiled
> and submitted as jars, and can not be submitted interactively.
> 
> Apache Spark comes with an ODBC/JDBC server, which can be used to submit SQL
> queries to Spark. However, this solution is limited to SQL and does not
> allow the client to leverage the rest of the Spark API, such as RDDs, MLlib
> and Streaming.
> 
> A third way of using Spark is via its command-line shell, which allows the
> interactive submission of snippets of Spark code. However, the shell entails
> running Spark code on the client machine and hence is not a viable mechanism
> for remote clients to submit Spark jobs.
> 
> Livy solves the limitations of the above three mechanisms, and provides the
> full Spark API as a multi-tenant service to remote clients.
> 
> Since the open source release of Livy in late 2015, we have seen tremendous
> interest among a diverse set of application developers and ISVs that want to
> build applications with Apache Spark. To make Livy a robust and flexible
> solution that will enable a broad and growing set of applications, it is
> important to grow a large and varied community of contributors.
> 
> = Initial Goals =
> 
>    * Move existing codebase, website, documentation and mailing lists to
>      Apache-hosted infrastructure
>    * Work with the infrastructure team to implement and approve our code
>      review, build, and testing workflows in the context of the ASF
>    * Incremental development and releases per Apache guidelines
> 
> = Current Status =
> 
> The Livy project began at Cloudera, as a part of the Hue project. Cloudera
> soon realized the broad applicability of Livy, and separated it out into an
> independent project in Nov 2015.
> 
> == Releases ==
> 
> Livy has undergone two public releases, tagged here:
> 
>   * https://github.com/cloudera/livy/releases/tag/v0.2.0
>   * https://github.com/cloudera/livy/releases/tag/v0.3.0
> 
> Tarballs and zip files were created for each release and hosted on github.
> Upon joining the incubator, we will adopt a more typical ASF release
> process.
> 
> == Source ==
> 
> Livy’s source is currently hosted on Github at:
> https://github.com/cloudera/livy
> 
> This repository will be transitioned to Apache’s git hosting during
> incubation.
> 
> == Code review ==
> 
> Livy’s code reviews are currently public and hosted on github as pull
> request reviews at: https://github.com/cloudera/livy/pulls
> The Livy developer community so far is happy with github pull request
> reviews and hopes to continue this after being admitted to the ASF.
> 
> == Issue Tracking ==
> 
> Livy’s bug and feature tracking is hosted on JIRA at:
> https://issues.cloudera.org/projects/LIVY/summary
> This JIRA instance contains bugs and development discussion dating back 1
> year and will provide an initial seed for the ASF JIRA
> 
> == Community Discussion ==
> 
> Livy has several public discussion forums:
> 
>   * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev
>   * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user
> 
> == Development Practices ==
> 
> The Livy project follows a review before commit philosophy. Every commit
> automatically runs through the unit tests and generates coverage reports
> presented as a pull request comment. Our experience with this process leads
> us to believe that it helps ease new contributors into the project. They get
> feedback quickly on common mistakes, lowering the burden on reviewers. Those
> same reviewers get to lead by example, showing the new contributors that we
> value feedback within our community even when changes are done by more
> experienced folks.
> 
> == Meritocracy ==
> 
> We believe strongly in meritocracy when electing committers and PMC members.
> In the past few months, the project has added two new committers from two
> different organisations, in recognition of their significant contributions
> to the project. We will encourage contributions and participation of all
> types, and ensure that contributors are appropriately recognized.
> 
> == Community ==
> 
> Though Livy is relatively new as a standalone open source project, it has
> already seen promising growth in its community across several organizations:
> Cloudera is the original development sponsor for Livy
> Microsoft pushed the development of the interpreter fixing high availability
> issues and adding additional features.
> Hortonworks has contributed the security features to Livy allowing kerberos
> and impersonation to work with Spark
> IBM is starting to make contributions to the Livy project
> A number of other patches contributed by community members
> 
> Livy currently relies on Google Groups for mailing lists. These lists have
> been active since the end of 2015/start of 2016. Currently, Livy’s user
> mailing list has 173 subscribers and has hosted a total of 227 topic
> threads. Livy’s developer list has 49 subscribers and has hosted 79 topic
> threads.
> 
> == Core Developers ==
> 
> The early contributions to Livy were made by Cloudera engineers. In 2016,
> engineers from Microsoft and Hortonworks joined the core developer
> community.
> 
> == Alignment ==
> 
> Livy is built upon Apache Spark, and other Apache projects like Apache
> Hadoop YARN. It’s used as a building block by Apache Zeppelin. These
> community connections combined with our focus on development practices that
> emphasize community engagement with a path to meritocratic recognition
> naturally align us with the ASF.
> 
> = Known Risks =
> 
> == Orphaned Products ==
> 
> The risk of Livy being abandoned is low because it is supported by three
> major big-data software vendors. Moreover, Livy is already used to power
> multiple releases of services and products used in production.
> 
> == Inexperience with Open Source ==
> 
> Several of the initial committers are experienced open source developers,
> several being committers and/or PMC members on other ASF projects (Spark,
> YARN).
> 
> == Homogenous Developers ==
> 
> The project already has a diverse developer base. It has contributions from
> 3 major organisations (Cloudera, Microsoft and Hortonworks), and is used in
> diverse applications, in diverse settings (On-Prem and Cloud).
> 
> == Reliance on salaried Developers ==
> 
> The contributions to the Livy project to date have been made by salaried
> engineers from Cloudera, Microsoft and Hortonworks. One of the individuals
> on the initial committer list has since left Microsoft and is currently
> unaffiliated. The remaining contributors are from Cloudera and Hortonworks.
> Since there are at least two major organizations involved, the risk of
> reliance on a single group of salaried developers is mitigated. The Livy
> user base is diverse, with users from across the globe, including users from
> academic settings. We aim to further diversify the Livy user and contributor
> base.
> 
> == Relationships with other Apache projects ==
> 
> Livy is closely tied to the Apache Spark project and currently addresses the
> scenarios for a REST based batch and interactive gateway for Spark jobs on
> YARN. Given the growing number of integrations with Livy, keeping it outside
> of Apache Spark aligns with the desire of the Apache Spark community to
> reduce the number of external dependencies in the Spark project.
> Specifically, the Apache Spark community has previously expressed a desire
> to keep job servers independent from the project.<<FootNote(See, for
> example, discussion of the Ooyala Spark Job Server in SPARK-818)>>
> Furthermore, while Livy common usage is closely tied to Spark deployments
> right now, its core building blocks can be reused elsewhere.  Livy’s Remote
> REPL could be used as a library for interactive scenarios in non-Spark
> projects. In the future, integrations with cluster managers like Apache
> Mesos and others could also be added.
> 
> The features provided by Livy have already been integrated with existing
> projects like Jupyter and Apache Zeppelin for their interactive Spark use
> cases. This validates the need for a project like Livy and provides an
> active downstream user base that the Livy community can interact with to
> seed future interest in the project.
> 
> Livy serves a similar purpose to Apache Toree (incubating) but differs in
> making session management, security and impersonation a focal design point.
> 
> == An Excessive Fascination with the Apache Brand ==
> 
> The primary motivation for submitting Livy to the ASF is to grow a diverse
> and strong community. We wish to encourage diverse organisations, including
> ISVs, to adopt Livy and contribute to Livy without any concerns about
> ownership or licensing.
> 
> = Documentation =
> 
> Documentation can be found on the Livy website http://livy.io/
> 
> The Livy web site is version controlled on the ‘gh-pages’ branch of the
> above repository.
> Additional documentation is provided on the github wiki:
> https://github.com/cloudera/livy/wiki
> APis are documented within the source code as JavaDoc style documentation
> comments.
> 
> = Initial Source =
> 
> The initial source code for Livy is hosted at
> https://github.com/cloudera/livy
> 
> = Source and Intellectual Property submission plan =
> 
> The Livy codebase and web site is currently hosted on GitHub and will be
> transitioned to the ASF repositories during incubation. Livy is already
> licensed under the Apache 2.0 license. Cloudera has collected ICLAs and
> CCLAs from all committers. There are, however, some contributions recently
> from authors that have not signed the CCLA and ICLA. If necessary for a
> successful SGA, we’ll seek the necessary documentation or replace the
> contributions.
> 
> The “Livy” name is not a registered trademark. We will need to do a
> trademark search and make sure it is available for the Apache Foundation
> prior to graduation.
> 
> Cloudera currently owns the domain name: http://livy.io/. Once all the
> documentation has moved over to ASF infrastructure, the main landing page
> will become livy.incubator.apache.org and the old domain will just act as a
> redirect.
> 
> = External Dependencies =
> 
> The list below covers the non-Apache dependencies of the project and their
> licenses.
> 
>   * Jetty: Apache 2.0
>   * Dropwizard Metrics: Apache 2.0
>   * FasterXML Jackson: Apache 2.0
>   * Netty: Apache 2.0
>   * Scala: BSD
>   * Py4J: BSD
>   * Scalatra: BSD
> 
> Build/test-only dependencies:
> 
>   * Mockito: MIT
>   * JUnit: Eclipse
> 
> = Required Resources =
> 
> == Mailing Lists ==
> 
>   * private@livy.incubator.apache.org (PPMC)
>   * dev@livy.incubator.apache.org (dev mailing list)
>   * user@livy.incubator.apache.org (User questions)
>   * commits@livy.incubator.apache.org (subscribers shouldn’t be able to post)
>   * issues@livy.incubator.apache.org (subscribers shouldn’t be able to post)
> 
> == Git Repository ==
> 
> git://git.apache.org/incubator-livy
> 
> == Issue Tracking ==
> 
> We would like to import our current JIRA project into the ASF JIRA, such
> that our historical commit message and code comments continue to reference
> the appropriate bug numbers.
> 
> = Initial Committers =
> 
>   * Marcelo Vanzin (vanzin@cloudera.com)
>   * Alex Man (alex@alexman.space)
>   * Jeff Zhang (zjffdu@gmail.com)
>   * Saisai Shao (sshao@hortonworks.com)
>   * Kostas Sakellis (kostas@cloudera.com)
> 
> = Affiliations =
> 
> The initial set of committers includes people employed by Cloudera and
> Hortonworks as well as one currently independent contributor.
> 
> = Additional Interested Contributors =
> 
> Those interested in getting involved with the project as we enter incubation
> are encouraged to list themselves here.
> 
>    * Ismaël Mejía (iemejia@apache.org)
> 
> = Sponsors =
> 
> == Champion ==
> 
> Sean Busbey (busbey@apache.org)
> 
> == Nominated Mentors ==
> 
>   * Bikas Saha (bikas@apache.org)
>   * Brock Noland (brock@phdata.io)
>   * Luciano Resende (lresende@apache.org)
> 
> == Sponsoring Entity ==
> 
> We ask that the Incubator PMC sponsor this proposal.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message