incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: [PROPOSAL] Stratosphere
Date Mon, 07 Apr 2014 16:18:36 GMT
Hi Alan,

I already took the freedom to add Henry to the proposal.

Btw, As I've learned that it is possible to be committer and mentor in one
person, I'd also volunteer as a mentor.

Best,
Sebastian
Am 07.04.2014 18:15 schrieb "Alan Gates" <gates@hortonworks.com>:

> Henry, definitely glad to have you on board.  Ashutosh Chauhan (new to the
> IPMC) has also expressed his interest to me offline in being a mentor.
>  I’ll add both of you the proposal.
>
> Alan.
>
> On Apr 6, 2014, at 10:34 AM, Henry Saputra <henry.saputra@gmail.com>
> wrote:
>
> > Hi Guys,
> >
> > The proposal looks great and I would love to help to sign up as a
> > Mentor if you guys still have space for one.
> >
> >
> > - Henry
> >
> >
> > On Sun, Mar 30, 2014 at 12:14 AM, Alan Gates <gates@hortonworks.com>
> wrote:
> >> I would like to propose Stratosphere as an Apache Incubator project.  I
> have posted the proposal to
> https://wiki.apache.org/incubator/StratosphereProposal and posted the
> text of the proposal below.
> >>
> >> Alan.
> >>
> >> = Stratosphere =
> >>
> >> == Abstract ==
> >> Stratosphere is an open source system for parallel data analysis.
> Stratosphere deeply integrates MapReduce and database technologies to
> provide expressive and optimizable programming interfaces and at the same
> time efficient and scalable execution.
> >>
> >> == Proposal ==
> >> Stratosphere is an open source system for expressive, declarative,
> fast, and efficient data analysis. Stratosphere combines the scalability
> and programming flexibility of distributed MapReduce-like platforms with
> the efficiency, out-of-core execution, and query optimization capabilities
> found in parallel databases.
> >>
> >> == Background ==
> >> There is currently a need for general-purpose cluster computing
> platforms that are compatible with the Hadoop ecosystem, are more
> efficient, easier to use, and can support more applications than Hadoop
> MapReduce, but are not restricted to a specific data model and language
> (such as the relational model and a variant of SQL). Stratosphere fulfils
> these needs.
> >>
> >> Stratosphere exposes expressive APIs in Java and Scala (conceptually
> similar to Spark, Cascading, Scalding) that allow arbitrary user-defined
> functions in the same language and data model that the program is written
> in. Stratosphere programs pass through a cost-based optimizer that finds
> the best execution path for these programs depending on the data and
> cluster characteristics. The design and implementation of Stratosphere is
> based on research that generalizes query optimizers in relational
> databases. Stratosphere has a distributed runtime that is architected upon
> the principles of parallel databases, providing true pipelining (a basis
> for stream processing) and efficient out-of-core algorithms for grouping,
> sorting, joining, and aggregating data. Stratosphere provides first-class
> support for iterative algorithms via a built-in iterate operator, covering
> Machine Learning and graph analysis use cases. It achieves performance
> similar to Apache Giraph without being a specialized graph processing
> system.
> >>
> >> Stratosphere has undergone three major releases (v0.1, v0.2, v0.4) and
> some minor ones.
> >>
> >> == Rationale ==
> >> Stratosphere started out in 2008 as a research project by the Technical
> University of Berlin, the Humboldt University of Berlin, and the Hasso
> Plattner Institute, and has received subsequent funding from the German
> Research Council, the European Institute of Innovation and Technology, the
> European Commision, and industry.
> >>
> >> The traction of Stratosphere has by far exceeded our initial
> expectations, and we are therefore seeking an organizational long-term home
> for Stratosphere beyond the University walls that will house and further
> encourage contributors from companies and other organizations that are
> interested in Stratosphere. We believe that the Apache Software Foundation
> is the ideal home for Stratosphere. Stratosphere integrates with several
> existing Apache projects, such as HDFS, YARN, HBase, and Avro. The team is
> familiar with the Apache processes and fully subscribes to the Apache
> mission. One of the proposing members is a long-time Apache contributor and
> PMC member.
> >>
> >> == Initial Goals ==
> >> * Move the existing codebase to Apache
> >> * Integrate with the Apache development process
> >> * Ensure all dependencies are compliant with Apache License version 2.0
> >> * Incremental development and releases per Apache guidelines
> >>
> >>
> >> == Current Status ==
> >> === Meritocracy ===
> >> Stratosphere operated on meritocratic principles from the get go. The
> initial project proposal submitted to the German Research Council
> >> in 2008 stated that all code developed in the project will be released
> as open source under the Apache 2 license. Currently, all the
> >> discussions pertaining to Stratosphere development are public on [[
> https://github.com/stratosphere/stratosphere|GitHub]]  and our [[
> https://groups.google.com/forum/#!forum/stratosphere-dev|mailing list]].
> The current incubation proposal includes the major code contributors to
> Stratosphere. Several additional people have worked on the Stratosphere
> codebase for research prototypes and industry use cases and would be
> interested in becoming committers. We are starting with a small committer
> group and we plan to add additional committers following an open
> merit-based decision process during the incubation phase.
> >>
> >> === Community ===
> >> Currently, the core of Stratosphere is developed at TU Berlin, mainly
> by the committers listed in this proposal. Additional people from several
> Universities and companies in Europe are working with Stratosphere and are
> interested in becoming committers to the project.
> >>
> >> During the years, Stratosphere has been adopted as a platform for
> research and teaching in several Universities (TU Berlin, HU Berlin, HPI,
> RWTH, Inria, KTH, U. Trento, UCSD, and others), and it is currently
> witnessing its first industrial installations. We are seeing a rapidly
> growing interest in Stratosphere by both startups and large companies, as
> well as a growing community (our first [[
> http://stratosphere.eu/events/2013/summit.html|Stratosphere Summit]] in
> November 2013 attracted over 80 participants). Stratosphere was recently
> accepted as a mentoring organization in Google Summer of Code 2014.
> >>
> >> We believe that acceptance in the Apache Software Foundation will
> consolidate the current community under one organizational umbrella, and
> most importantly accelerate the growth of the community.
> >>
> >> === Core developers ===
> >> The core developers of the system are Stephan Ewen, Fabian Hueske,
> Daniel Warneke, Robert Metzger, Ufuk Celebi, and Aljoscha Krettek, who are
> all committers in the current proposal.
> >>
> >> === Alignment ===
> >> Stratosphere is compatible with, and related to several Apache
> projects. Stratosphere re-uses parts of Apache Hadoop, in particular HDFS
> and YARN, as well as Apache HBase and Apache Avro. Stratosphere is a very
> good compilation target for query languages such as Apache Hive and Apache
> Pig.
> >>
> >> == Known Risks ==
> >> === Orphaned Products ===
> >> There is strong interest in Stratosphere by several companies and
> organizations, and there is currently a long-term commitment to fund
> salaried developers for Stratosphere by public and private organizations in
> Europe.
> >>
> >> === Inexperience with Open Source ===
> >> Sebastian Schelter is a committer and PMC member of Apache Mahout and
> Apache Giraph, member of the Apache Software Foundation, member of the
> Incubator PMC and project mentor for Apache Drill. Sebastian, along with
> our mentors, will guide the rest of the committers that have experience
> with releasing software as open source but little experience in
> participating in an open source project besides Stratosphere itself.
> >>
> >> In mid-2013 Stratosphere transitioned from an “open source project with
> publicly accessible source code” to an open source project that puts the
> community first. We moved from a University-hosted git repository to
> GitHub, where we discuss all issues publicly. This also includes release
> planning (via GitHub’s milestone feature) and code reviews. We also moved
> our build system to the publicly available Travis-CI. The mailing lists are
> hosted with Google Groups, we use the public Maven repository
> infrastructure of Sonatype. The source code of the www.stratosphere.euwebsite is publicly
available and is meant to be changed by external
> contributors (for example for documentation purposes).
> >>
> >> === Homogeneous Developers ===
> >> Most committers in this proposal belong to the same institution (TU
> Berlin). The engagement of these committers goes well beyond the necessary
> development to support research, and all committers work on Stratosphere in
> their free time. Several people from other institutions are working on and
> are familiar with the Stratosphere codebase. We will work to attract them
> as future committers during the incubation phase, following a merit-based
> approach.
> >>
> >> === Reliance on Salaried Developers ===
> >> Currently, Stratosphere receives support from salaried developers, in
> particular from graduate students at TU Berlin that are funded by the
> German Research Council, the European Institute of Technology, and the
> European Commission. These students work in their free time on Stratosphere
> in addition to their employment.
> >>
> >> We expect that Stratosphere development will occur on both salaried and
> volunteer time. We will recruit additional committers, including
> non-salaried developers, and we will work to ensure that the project will
> move forward independently of salaried developers.
> >>
> >> === Relationship with Other Apache Products ===
> >> Stratosphere interfaces with several existing Apache projects: Apache
> HBase for storage, Apache Hadoop (HDFS for storage, YARN for resource
> management, and Stratosphere contains a generic wrapper for Hadoop
> MapReduce input formats), and Apache Avro (for serialization). Stratosphere
> uses Apache Maven and Apache Commons libraries internally. Stratosphere can
> be a great compilation target for Apache Pig and Apache Hive, although such
> functionality is not yet implemented.
> >>
> >> Stratosphere is also related with several projects undergoing
> incubation in the Apache Incubation project, such as Tez, Drill, and Spark
> (graduated). While all these projects target sufficiently different spaces
> and have different architectures, it would be interesting to explore code
> reuse possibilities. For example, we are currently basing our design for
> compiling SQL to Stratosphere on the Optiq library, also used by Apache
> Drill.
> >>
> >> === An Excessive Fascination with the Apache Brand ===
> >> We believe that the Apache brand will help us attract contributors to
> Stratosphere, by giving us a well-defined, transparent development process
> under a known brand. At the same time, Stratosphere already has a healthy
> community and current funding guarantees the further codebase development
> and growth of the project for the next 3-5 years. The reason for this
> proposal is not to gain publicity, but to further strengthen the longevity
> of the project as explained in the Rationale section.
> >>
> >> == Documentation ==
> >> * [[https://stratosphere.eu|Project website]]
> >> * [[http://stratosphere.eu/docs/0.4/|Documentation]]
> >> * [[https://github.com/stratosphere/stratosphere|Codebase]]
> >> * [[https://groups.google.com/forum/#!forum/stratosphere-dev|Mailinglist]]
> >>
> >> == Initial Source ==
> >> Stratosphere is hosted on [[
> https://github.com/stratosphere/stratosphere|GitHub]] . This is the
> codebase that we will migrate to the Apache Foundation. The code was
> previously hosted on a TU Berlin’s own git infrastructure. It has always
> been Apache 2.0 licensed.
> >>
> >> === Source and Intellectual Property Submission Plan ===
> >> All initial and past committers will sign a CLA with the ASF while the
> incubator proposal for Stratosphere is being discussed. All organizations
> that have employed Stratosphere contributors in the past will sign a SGA.
> Current contributors will sign a CCLA. All major contributors are still
> active in the project.
> >>
> >> === External Dependencies ===
> >> All critical dependencies are, to the extend of our knowledge, from
> other Apache projects. These include Apache Hadoop (for YARN and HDFS) and
> some libraries (log4j, commons codec, junit and more). Our web frontend
> uses some MIT-licensed JavaScript libraries.
> >>
> >> == Required Resources ==
> >>
> >> === Mailing list ===
> >> We will migrate our mailing lists to the following:
> >> * users@stratosphere.incubator.apache.org
> >> * dev@stratosphere.incubator.apache.org
> >> * private@stratosphere.incubator.apache.org
> >> * commits@stratosphere.incubator.apache.org
> >>
> >> === Source control ===
> >> We would like to use Git for source control and enable GitHib mirroring
> functionality, where code reviews on GitHub are automatically
> >> forwarded to the developer mailing list. (See also: [[
> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and]
> ])
> >>
> >>
> >> === Issue tracking ===
> >> We are currently using GitHub for issue tracking. We request an
> Apache-hosted JIRA, and we will import existing issues there.
> >>
> >>
> >> == Initial committers ==
> >> * Stephan Ewen - stephan.ewen@tu-berlin.de
> >> * Fabian Hueske - fabian.hueske@tu-berlin.de
> >> * Daniel Warneke - warneke@posteo.de
> >> * Robert Metzger - metrobert@gmail.com
> >> * Ufuk Celebi - u.celebi@fu-berlin.de
> >> * Aljoscha Krettek - aljoscha.krettek@gmail.com
> >> * Kostas Tzoumas - kostas.tzoumas@tu-berlin.de
> >> * Sebastian Schelter  - ssc@apache.org
> >>
> >> === Affiliations ===
> >> * Stephan Ewen (TU Berlin)
> >> * Fabian Hueske (TU Berlin)
> >> * Daniel Warneke (Amadeus IT Group)
> >> * Robert Metzger (TU Berlin)
> >> * Ufuk Celebi (FU Berlin)
> >> * Aljoscha Krettek (TU Berlin)
> >> * Kostas Tzoumas (TU Berlin)
> >> * Sebastian Schelter (TU Berlin)
> >>
> >> == Sponsors ==
> >> === Champion ===
> >> Alan Gates (gates@apache.org)
> >>
> >> === Nominated Mentors ===
> >> * Sean Owen (srowen@apache.org) (Note: Sean is an Apache member but
> not currently on the IPC, he will need to request IPMC membership)
> >> * Ted Dunning (tdunning@apache.org)
> >> * Owen O'Malley (omalley@apache.org)
> >>
> >> === Sponsoring Entity ===
> >> The Apache Incubator
> >>
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> entity to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> immediately
> >> and delete it from your system. Thank You.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message