incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: [VOTE] Livy to enter Apache Incubator
Date Wed, 31 May 2017 23:01:39 GMT
+1 (non-binding)


Brock Noland <brock@apache.org>于2017年6月1日周四 上午3:13写道:

> +1 (binding)
>
> On Wed, May 31, 2017 at 1:59 PM, Kostas Sakellis <kostas@cloudera.com>
> wrote:
>
> > +1 (non-binding)
> >
> > On Wed, May 31, 2017 at 11:46 AM, Andrew Purtell <
> andrew.purtell@gmail.com
> > >
> > wrote:
> >
> > > +1 (binding)
> > >
> > > > On May 31, 2017, at 6:03 AM, Sean Busbey <busbey@apache.org> wrote:
> > > >
> > > > Hi folks!
> > > >
> > > > I'm calling a vote to accept "Livy" into the Apache Incubator.
> > > >
> > > > The full proposal is available below, and is also available in the
> > wiki:
> > > >
> > > > https://wiki.apache.org/incubator/LivyProposal
> > > >
> > > > For additional context, please see the discussion thread:
> > > >
> > > > https://s.apache.org/incubator-livy-proposal-thread
> > > >
> > > > Please cast your vote:
> > > >
> > > > [ ] +1, bring Livy into Incubator
> > > > [ ] -1, do not bring Livy into Incubator, because...
> > > >
> > > > The vote will open at least for 72 hours and only votes from the
> > > Incubator
> > > > PMC are binding.
> > > >
> > > > I start with my vote:
> > > > +1
> > > >
> > > > ----
> > > >
> > > > = Abstract =
> > > >
> > > > Livy is web service that exposes a REST interface for managing long
> > > running
> > > > Apache Spark contexts in your cluster. With Livy, new applications
> can
> > be
> > > > built on top of Apache Spark that require fine grained interaction
> with
> > > many
> > > > Spark contexts.
> > > >
> > > > = Proposal =
> > > >
> > > > Livy is an open-source REST service for Apache Spark. Livy enables
> > > > applications to submit Spark applications and retrieve results
> without
> > a
> > > > co-location requirement on the Spark cluster.
> > > >
> > > > We propose to contribute the Livy codebase and associated artifacts
> > (e.g.
> > > > documentation, web-site context etc) to the Apache Software
> Foundation.
> > > >
> > > > = Background =
> > > >
> > > > Apache Spark is a fast and general purpose distributed compute
> engine,
> > > with
> > > > a versatile API. It enables processing of large quantities of static
> > data
> > > > distributed over a cluster of machines, as well as processing of
> > > continuous
> > > > streams of data. It is the preferred distributed data processing
> engine
> > > for
> > > > data engineering, stream processing and data science workloads. Each
> > > Spark
> > > > application uses a construct called the SparkContext, which is the
> > > > application’s connection or entry point to the Spark engine. Each
> Spark
> > > > application will have its own SparkContext.
> > > >
> > > > Livy enables clients to interact with one or more Spark sessions
> > through
> > > the
> > > > Livy Server, which acts as a proxy layer. Livy Clients have fine
> > grained
> > > > control over the lifecycle of the Spark sessions, as well as the
> > ability
> > > to
> > > > submit jobs and retrieve results, all over HTTP. Clients have two
> modes
> > > of
> > > > interaction: RPC Client API, available in Java and Python, which
> allows
> > > > results to be retrieved as Java or Python objects. The serialization
> > and
> > > > deserialization of the results is handled by the Livy framework. HTTP
> > > based
> > > > API that allows submission of code snippets, and retrieval of the
> > > results in
> > > > different formats.
> > > >
> > > > Multi-tenant resource allocation and security: Livy enables multiple
> > > > independent Spark sessions to be managed simultaneously. Multiple
> > clients
> > > > can also interact simultaneously with the same Spark session and
> share
> > > the
> > > > resources of that Spark session. Livy can also enforce secure,
> > > authenticated
> > > > communication between the clients and their respective Spark
> sessions.
> > > >
> > > > More information on Livy can be found at the existing open source
> > > website:
> > > > http://livy.io/
> > > >
> > > > = Rationale =
> > > >
> > > > Users want to use Spark’s powerful processing engine and API as the
> > data
> > > > processing backend for interactive applications. However, the job
> > > submission
> > > > and application interaction mechanisms built into Apache Spark are
> > > > insufficient and cumbersome for multi-user interactive applications.
> > > >
> > > > The primary mechanism for applications to submit Spark jobs is via
> > > > spark-submit
> > > > (http://spark.apache.org/docs/latest/submitting-applications.html),
> > > which is
> > > > available as a command line tool as well as a programmatic API.
> > However,
> > > > spark-submit has the following limitations that make it difficult to
> > > build
> > > > interactive applications: It is slow: each invocation of spark-submit
> > > > involves a setup phase where cluster resources are acquired, new
> > > processes
> > > > are forked, etc. This setup phase runs for many seconds, or even
> > minutes,
> > > > and hence is too slow for interactive applications. It is cumbersome
> > and
> > > > lacks flexibility: application code and dependencies have to be
> > > pre-compiled
> > > > and submitted as jars, and can not be submitted interactively.
> > > >
> > > > Apache Spark comes with an ODBC/JDBC server, which can be used to
> > submit
> > > SQL
> > > > queries to Spark. However, this solution is limited to SQL and does
> not
> > > > allow the client to leverage the rest of the Spark API, such as RDDs,
> > > MLlib
> > > > and Streaming.
> > > >
> > > > A third way of using Spark is via its command-line shell, which
> allows
> > > the
> > > > interactive submission of snippets of Spark code. However, the shell
> > > entails
> > > > running Spark code on the client machine and hence is not a viable
> > > mechanism
> > > > for remote clients to submit Spark jobs.
> > > >
> > > > Livy solves the limitations of the above three mechanisms, and
> provides
> > > the
> > > > full Spark API as a multi-tenant service to remote clients.
> > > >
> > > > Since the open source release of Livy in late 2015, we have seen
> > > tremendous
> > > > interest among a diverse set of application developers and ISVs that
> > > want to
> > > > build applications with Apache Spark. To make Livy a robust and
> > flexible
> > > > solution that will enable a broad and growing set of applications, it
> > is
> > > > important to grow a large and varied community of contributors.
> > > >
> > > > = Initial Goals =
> > > >
> > > >  * Move existing codebase, website, documentation and mailing lists
> to
> > > >    Apache-hosted infrastructure
> > > >  * Work with the infrastructure team to implement and approve our
> code
> > > >    review, build, and testing workflows in the context of the ASF
> > > >  * Incremental development and releases per Apache guidelines
> > > >
> > > > = Current Status =
> > > >
> > > > The Livy project began at Cloudera, as a part of the Hue project.
> > > Cloudera
> > > > soon realized the broad applicability of Livy, and separated it out
> > into
> > > an
> > > > independent project in Nov 2015.
> > > >
> > > > == Releases ==
> > > >
> > > > Livy has undergone two public releases, tagged here:
> > > >
> > > > * https://github.com/cloudera/livy/releases/tag/v0.2.0
> > > > * https://github.com/cloudera/livy/releases/tag/v0.3.0
> > > >
> > > > Tarballs and zip files were created for each release and hosted on
> > > github.
> > > > Upon joining the incubator, we will adopt a more typical ASF release
> > > > process.
> > > >
> > > > == Source ==
> > > >
> > > > Livy’s source is currently hosted on Github at:
> > > > https://github.com/cloudera/livy
> > > >
> > > > This repository will be transitioned to Apache’s git hosting during
> > > > incubation.
> > > >
> > > > == Code review ==
> > > >
> > > > Livy’s code reviews are currently public and hosted on github as pull
> > > > request reviews at: https://github.com/cloudera/livy/pulls
> > > > The Livy developer community so far is happy with github pull request
> > > > reviews and hopes to continue this after being admitted to the ASF.
> > > >
> > > > == Issue Tracking ==
> > > >
> > > > Livy’s bug and feature tracking is hosted on JIRA at:
> > > > https://issues.cloudera.org/projects/LIVY/summary
> > > > This JIRA instance contains bugs and development discussion dating
> > back 1
> > > > year and will provide an initial seed for the ASF JIRA
> > > >
> > > > == Community Discussion ==
> > > >
> > > > Livy has several public discussion forums:
> > > >
> > > > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev
> > > > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user
> > > >
> > > > == Development Practices ==
> > > >
> > > > The Livy project follows a review before commit philosophy. Every
> > commit
> > > > automatically runs through the unit tests and generates coverage
> > reports
> > > > presented as a pull request comment. Our experience with this process
> > > leads
> > > > us to believe that it helps ease new contributors into the project.
> > They
> > > get
> > > > feedback quickly on common mistakes, lowering the burden on
> reviewers.
> > > Those
> > > > same reviewers get to lead by example, showing the new contributors
> > that
> > > we
> > > > value feedback within our community even when changes are done by
> more
> > > > experienced folks.
> > > >
> > > > == Meritocracy ==
> > > >
> > > > We believe strongly in meritocracy when electing committers and PMC
> > > members.
> > > > In the past few months, the project has added two new committers from
> > two
> > > > different organisations, in recognition of their significant
> > > contributions
> > > > to the project. We will encourage contributions and participation of
> > all
> > > > types, and ensure that contributors are appropriately recognized.
> > > >
> > > > == Community ==
> > > >
> > > > Though Livy is relatively new as a standalone open source project, it
> > has
> > > > already seen promising growth in its community across several
> > > organizations:
> > > > Cloudera is the original development sponsor for Livy
> > > > Microsoft pushed the development of the interpreter fixing high
> > > availability
> > > > issues and adding additional features.
> > > > Hortonworks has contributed the security features to Livy allowing
> > > kerberos
> > > > and impersonation to work with Spark
> > > > IBM is starting to make contributions to the Livy project
> > > > A number of other patches contributed by community members
> > > >
> > > > Livy currently relies on Google Groups for mailing lists. These lists
> > > have
> > > > been active since the end of 2015/start of 2016. Currently, Livy’s
> user
> > > > mailing list has 173 subscribers and has hosted a total of 227 topic
> > > > threads. Livy’s developer list has 49 subscribers and has hosted 79
> > topic
> > > > threads.
> > > >
> > > > == Core Developers ==
> > > >
> > > > The early contributions to Livy were made by Cloudera engineers. In
> > 2016,
> > > > engineers from Microsoft and Hortonworks joined the core developer
> > > > community.
> > > >
> > > > == Alignment ==
> > > >
> > > > Livy is built upon Apache Spark, and other Apache projects like
> Apache
> > > > Hadoop YARN. It’s used as a building block by Apache Zeppelin. These
> > > > community connections combined with our focus on development
> practices
> > > that
> > > > emphasize community engagement with a path to meritocratic
> recognition
> > > > naturally align us with the ASF.
> > > >
> > > > = Known Risks =
> > > >
> > > > == Orphaned Products ==
> > > >
> > > > The risk of Livy being abandoned is low because it is supported by
> > three
> > > > major big-data software vendors. Moreover, Livy is already used to
> > power
> > > > multiple releases of services and products used in production.
> > > >
> > > > == Inexperience with Open Source ==
> > > >
> > > > Several of the initial committers are experienced open source
> > developers,
> > > > several being committers and/or PMC members on other ASF projects
> > (Spark,
> > > > YARN).
> > > >
> > > > == Homogenous Developers ==
> > > >
> > > > The project already has a diverse developer base. It has
> contributions
> > > from
> > > > 3 major organisations (Cloudera, Microsoft and Hortonworks), and is
> > used
> > > in
> > > > diverse applications, in diverse settings (On-Prem and Cloud).
> > > >
> > > > == Reliance on salaried Developers ==
> > > >
> > > > The contributions to the Livy project to date have been made by
> > salaried
> > > > engineers from Cloudera, Microsoft and Hortonworks. One of the
> > > individuals
> > > > on the initial committer list has since left Microsoft and is
> currently
> > > > unaffiliated. The remaining contributors are from Cloudera and
> > > Hortonworks.
> > > > Since there are at least two major organizations involved, the risk
> of
> > > > reliance on a single group of salaried developers is mitigated. The
> > Livy
> > > > user base is diverse, with users from across the globe, including
> users
> > > from
> > > > academic settings. We aim to further diversify the Livy user and
> > > contributor
> > > > base.
> > > >
> > > > == Relationships with other Apache projects ==
> > > >
> > > > Livy is closely tied to the Apache Spark project and currently
> > addresses
> > > the
> > > > scenarios for a REST based batch and interactive gateway for Spark
> jobs
> > > on
> > > > YARN. Given the growing number of integrations with Livy, keeping it
> > > outside
> > > > of Apache Spark aligns with the desire of the Apache Spark community
> to
> > > > reduce the number of external dependencies in the Spark project.
> > > > Specifically, the Apache Spark community has previously expressed a
> > > desire
> > > > to keep job servers independent from the project.<<FootNote(See,
for
> > > > example, discussion of the Ooyala Spark Job Server in SPARK-818)>>
> > > > Furthermore, while Livy common usage is closely tied to Spark
> > deployments
> > > > right now, its core building blocks can be reused elsewhere.  Livy’s
> > > Remote
> > > > REPL could be used as a library for interactive scenarios in
> non-Spark
> > > > projects. In the future, integrations with cluster managers like
> Apache
> > > > Mesos and others could also be added.
> > > >
> > > > The features provided by Livy have already been integrated with
> > existing
> > > > projects like Jupyter and Apache Zeppelin for their interactive Spark
> > use
> > > > cases. This validates the need for a project like Livy and provides
> an
> > > > active downstream user base that the Livy community can interact with
> > to
> > > > seed future interest in the project.
> > > >
> > > > Livy serves a similar purpose to Apache Toree (incubating) but
> differs
> > in
> > > > making session management, security and impersonation a focal design
> > > point.
> > > >
> > > > == An Excessive Fascination with the Apache Brand ==
> > > >
> > > > The primary motivation for submitting Livy to the ASF is to grow a
> > > diverse
> > > > and strong community. We wish to encourage diverse organisations,
> > > including
> > > > ISVs, to adopt Livy and contribute to Livy without any concerns about
> > > > ownership or licensing.
> > > >
> > > > = Documentation =
> > > >
> > > > Documentation can be found on the Livy website http://livy.io/
> > > >
> > > > The Livy web site is version controlled on the ‘gh-pages’ branch of
> the
> > > > above repository.
> > > > Additional documentation is provided on the github wiki:
> > > > https://github.com/cloudera/livy/wiki
> > > > APis are documented within the source code as JavaDoc style
> > documentation
> > > > comments.
> > > >
> > > > = Initial Source =
> > > >
> > > > The initial source code for Livy is hosted at
> > > > https://github.com/cloudera/livy
> > > >
> > > > = Source and Intellectual Property submission plan =
> > > >
> > > > The Livy codebase and web site is currently hosted on GitHub and will
> > be
> > > > transitioned to the ASF repositories during incubation. Livy is
> already
> > > > licensed under the Apache 2.0 license. Cloudera has collected ICLAs
> and
> > > > CCLAs from all committers. There are, however, some contributions
> > > recently
> > > > from authors that have not signed the CCLA and ICLA. If necessary
> for a
> > > > successful SGA, we’ll seek the necessary documentation or replace the
> > > > contributions.
> > > >
> > > > The “Livy” name is not a registered trademark. We will need to do
a
> > > > trademark search and make sure it is available for the Apache
> > Foundation
> > > > prior to graduation.
> > > >
> > > > Cloudera currently owns the domain name: http://livy.io/. Once all
> the
> > > > documentation has moved over to ASF infrastructure, the main landing
> > page
> > > > will become livy.incubator.apache.org and the old domain will just
> act
> > > as a
> > > > redirect.
> > > >
> > > > = External Dependencies =
> > > >
> > > > The list below covers the non-Apache dependencies of the project and
> > > their
> > > > licenses.
> > > >
> > > > * Jetty: Apache 2.0
> > > > * Dropwizard Metrics: Apache 2.0
> > > > * FasterXML Jackson: Apache 2.0
> > > > * Netty: Apache 2.0
> > > > * Scala: BSD
> > > > * Py4J: BSD
> > > > * Scalatra: BSD
> > > >
> > > > Build/test-only dependencies:
> > > >
> > > > * Mockito: MIT
> > > > * JUnit: Eclipse
> > > >
> > > > = Required Resources =
> > > >
> > > > == Mailing Lists ==
> > > >
> > > > * private@livy.incubator.apache.org (PPMC)
> > > > * dev@livy.incubator.apache.org (dev mailing list)
> > > > * user@livy.incubator.apache.org (User questions)
> > > > * commits@livy.incubator.apache.org (subscribers shouldn’t be able
> to
> > > post)
> > > > * issues@livy.incubator.apache.org (subscribers shouldn’t be able to
> > > post)
> > > >
> > > > == Git Repository ==
> > > >
> > > > git://git.apache.org/incubator-livy
> > > >
> > > > == Issue Tracking ==
> > > >
> > > > We would like to import our current JIRA project into the ASF JIRA,
> > such
> > > > that our historical commit message and code comments continue to
> > > reference
> > > > the appropriate bug numbers.
> > > >
> > > > = Initial Committers =
> > > >
> > > > * Marcelo Vanzin (vanzin@cloudera.com)
> > > > * Alex Man (alex@alexman.space)
> > > > * Jeff Zhang (zjffdu@gmail.com)
> > > > * Saisai Shao (sshao@hortonworks.com)
> > > > * Kostas Sakellis (kostas@cloudera.com)
> > > >
> > > > = Affiliations =
> > > >
> > > > The initial set of committers includes people employed by Cloudera
> and
> > > > Hortonworks as well as one currently independent contributor.
> > > >
> > > > = Additional Interested Contributors =
> > > >
> > > > Those interested in getting involved with the project as we enter
> > > incubation
> > > > are encouraged to list themselves here.
> > > >
> > > >  * Ismaël Mejía (iemejia@apache.org)
> > > >
> > > > = Sponsors =
> > > >
> > > > == Champion ==
> > > >
> > > > Sean Busbey (busbey@apache.org)
> > > >
> > > > == Nominated Mentors ==
> > > >
> > > > * Bikas Saha (bikas@apache.org)
> > > > * Brock Noland (brock@phdata.io)
> > > > * Luciano Resende (lresende@apache.org)
> > > >
> > > > == Sponsoring Entity ==
> > > >
> > > > We ask that the Incubator PMC sponsor this proposal.
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > > For additional commands, e-mail: general-help@incubator.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: general-help@incubator.apache.org
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message