incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Feng <enjoyj...@gmail.com>
Subject Re: [VOTE] Accept SystemML into Apache Incubator
Date Wed, 28 Oct 2015 06:08:53 GMT
Luciano,

There is a copy/paste error pointing to http://wiki.apache.org/incubator/Nuvem.

Sent from my iPhone 6 Plus

> On Oct 27, 2015, at 10:03 PM, Luciano Resende <luckbr1975@gmail.com> wrote:
> 
> On Tue, Oct 27, 2015 at 9:52 PM, Luciano Resende <luckbr1975@gmail.com>
> wrote:
> 
>> 
>> After initial discussion, please vote on the acceptance of SystemML
>> Project for incubation at the Apache Incubator. The full proposal is
>> available at the end of this message and on the wiki at :
>> 
>> https://wiki.apache.org/incubator/SystemML
>> <http://wiki.apache.org/incubator/Nuvem>
>> 
>> Please cast your votes:
>> 
>> [ ] +1, bring SystemML into Incubator
>> [ ] +0, I don't care either way
>> [ ] -1, do not bring SystemML into Incubator, because...
>> 
>> The vote is open for the next 72 hours and only votes from the
>> Incubator PMC are binding.
>> 
>> 
>> = SystemML =
>> 
>> == Abstract ==
>> 
>> SystemML provides declarative large-scale machine learning (ML) that aims
>> at flexible specification of ML algorithms and automatic generation of
>> hybrid runtime plans ranging from single node, in-memory computations, to
>> distributed computations on Apache Hadoop MapReduce and  Apache Spark. ML
>> algorithms are expressed in an R-like syntax, that includes linear algebra
>> primitives, statistical functions, and ML-specific constructs. This
>> high-level language significantly increases the productivity of data
>> scientists as it provides (1) full flexibility in expressing custom
>> analytics, and (2) data independence from the underlying input formats and
>> physical data representations. Automatic optimization according to data
>> characteristics such as distribution on the disk file system, and sparsity
>> as well as processing characteristics in the distributed environment like
>> number of nodes, CPU, memory per node, ensures both efficiency and
>> scalability.
>> 
>> == Proposal ==
>> 
>> The goal of SystemML is to create a commercial friendly, scalable and
>> extensible machine learning framework for data scientists to create or
>> extend machine learning algorithms using a declarative syntax. The machine
>> learning framework enables data scientists to develop algorithms locally
>> without the need of a distributed cluster, and scale up and scale out the
>> execution of these algorithms to distributed Apache Hadoop MapReduce or
>> Apache Spark clusters.
>> 
>> == Background ==
>> 
>> SystemML started as a research project in the IBM Almaden Research Center
>> around 2007 aiming to enable data scientists to develop machine learning
>> algorithms independent of data and cluster characteristics.
>> 
>> == Rationale ==
>> 
>> SystemML enables the specification of machine learning algorithms using a
>> declarative machine learning (DML) language. DML includes linear algebra
>> primitives, statistical functions, and additional constructs. This
>> high-level language significantly increases the productivity of data
>> scientists as it provides (1) full flexibility in expressing custom
>> analytics and (2) data independence from the underlying input formats and
>> physical data representations.
>> 
>> SystemML computations can be executed in a variety of different modes. It
>> supports single node in-memory computations and large-scale distributed
>> cluster computations. This allows the user to quickly prototype new
>> algorithms in local environments but automatically scale to large data
>> sizes as well without changing the algorithm implementation.
>> 
>> Algorithms specified in DML are dynamically compiled and optimized based
>> on data and cluster characteristics using rule-based and cost-based
>> optimization techniques. The optimizer automatically generates hybrid
>> runtime execution plans ranging from in-memory single-node execution to
>> distributed computations on Apache Spark or Apache Hadoop MapReduce. This
>> ensures both efficiency and scalability. Automatic optimization reduces or
>> eliminates the need to hand-tune distributed runtime execution plans and
>> system configurations.
>> 
>> == Initial Goals ==
>> 
>> The initial goals to move SystemML to the Apache Incubator is to broaden
>> the community foster the contributions from data scientists to develop new
>> machine learning algorithms and enhance the existing ones. Ultimately, this
>> may lead to the creation of an industry standard in specifying machine
>> learning algorithms.
>> 
>> == Current Status ==
>> 
>> The initial code has been developed at the IBM Almaden Research Center in
>> California and has recently been made available in GitHub under the Apache
>> Software License 2.0. The project currently supports a single node (in
>> memory computation) as well as distributed computations utilizing Apache
>> Hadoop MapReduce or Apache Spark clusters.
>> 
>> === Meritocracy ===
>> 
>> We plan to invest in supporting a meritocracy. We will discuss the
>> requirements in an open forum. Several companies have already expressed
>> interest in this project, and we intend to invite additional developers to
>> participate. We will encourage and monitor community participation so that
>> privileges can be extended to those that contribute operating to the
>> standard of meritocracy that Apache emphasizes.
>> 
>> === Community ===
>> 
>> The need for a generic scalable and declarative machine learning approach
>> in the open source is tremendous, so there is a potential for a very large
>> community. We believe that SystemML’s extensible architecture, declarative
>> syntax, cost based optimizer and its alignment with Spark will further
>> encourage community participation not only in enhancing the infrastructure
>> but also speed up the creation of algorithms for a wide range of use
>> cases.  We expect that over time SystemML will attract a large community.
>> 
>> === Alignment ===
>> 
>> The initial committers strongly believe that a generic scalable and
>> declarative machine learning approach for machine learning will gain
>> broader adoption as an open source, community driven project, where the
>> community can contribute not only to the core components, but also to a
>> growing collection of algorithms which will leverage the optimizations and
>> ease of scaling in SystemML. Our hope is that the Apache Spark, Apache
>> Hadoop and other communities will find tremendous value in SystemML and
>> this will foster further collaboration between these projects furthering
>> the already existing integration points.
>> 
>> == Known Risks ==
>> 
>> To-date, development has been sponsored by IBM and coordinated mostly by
>> the core team of researchers at the IBM Almaden Research Center.
>> 
>> For SystemML to fully transition to an "Apache Way" governance model, it
>> needs to start embracing the meritocracy-centric way of growing the
>> community of contributors.
>> 
>> === Orphaned Products ===
>> 
>> The SystemML developers and previous sponsor have a long-term interest in
>> use and maintenance of the code and there is also hope that growing a
>> diverse community around the project will become a guarantee against the
>> project becoming orphaned. We feel that it is also important to put formal
>> governance in place both for the project and the contributors as the
>> project expands. We feel ASF is the best location for this.
>> 
>> === Inexperience with Open Source ===
>> 
>> The current SystemML set of contributors are very diverse regarding
>> participation in Open Source. While some initial members are experiencing
>> an open source project for the first time, others have been contributing
>> and mentoring various Apache and non-Apache open source projects.
>> 
>> === Reliance on Salaried Developers ===
>> 
>> SystemML currently receives substantial support from salaried developers.
>> However, they are all passionate about the project, and we are confident
>> that the project will continue even if no salaried developers contribute to
>> the project. We are committed to recruiting additional committers including
>> non-salaried developers.
>> 
>> 
>> === Relationships with Other Apache Products ===
>> 
>> Currently, SystemML integrates with Apache Hadoop MapReduce and Apache
>> Spark as underlying computational distributed runtimes.
>> 
>> === An Excessive Fascination with the Apache Brand ===
>> 
>> SystemML solves a real need for generic scalable and declarative machine
>> learning approach for machine learning in the Apache Hadoop and Spark
>> ecosystems, something that has been addressed in a very ad hoc manner so
>> far by multiple Apache projects. Our rationale for developing SystemML as
>> an Apache project is detailed in the Rationale section. We believe that the
>> Apache brand and community process will help us attract more contributors
>> to this project, and help establish ubiquitous APIs.
>> 
>> 
>> == Documentation ==
>> 
>> Documentation regarding SystemML is available in the current GitHub
>> repository https://github.com/SparkTC/systemml/tree/master/system-ml/docs.
>> 
>> 
>> == Initial Source ==
>> 
>> Initial source is available on GitHub under the Apache License 2.0
>> 
>> https://github.com/SparkTC/systemml
>> 
>> == Source and Intellectual Property Submission Plan ==
>> 
>> We know of no legal encumbrances in the transfer of source code and rights
>> to Apache. In fact, given the internal IBM due diligence performed on the
>> source code during open sourcing, we expect the code base to be free from
>> any IP issues.
>> 
>> == External Dependencies ==
>> 
>> SystemML is written in Java and currently supports Apache Hadoop MapReduce
>> and Apache Spark runtimes.
>> 
>> To the best of our knowledge, all dependencies of SystemML are distributed
>> under Apache compatible licenses. Upon acceptance to the incubator, we
>> would begin a thorough analysis of all transitive dependencies to verify
>> this fact and introduce license checking into the build and release process
>> (for instance integrating Apache Rat).
>> 
>> Cryptography
>> N/A
>> 
>> == Required Resources ==
>> 
>> === Mailing lists ===
>>      * private@sysml.incubator.apache.org (moderated subscriptions)
>>      * commits@sysml.incubator.apache.org
>>      * dev@sysml.incubator.apache.org
>> 
>> === Git Repository ===
>>      * https://git-wip-us.apache.org/repos/asf/incubator-sysml.git
>> 
>> === Issue Tracking ===
>>      * JIRA (SYSML)
>> 
>> == Initial Committers ==
>> 
>> * Luciano Resende (lresende AT apache DOT org)
>> * Berthold Reinwald (reinwald AT us DOT ibm DOT com)
>> * Matthias Boehm (mboehm AT us DOT ibm DOT com)
>> * Shirish Tatikonda (statiko AT us DOT ibm DOT com)
>> * Niketan Pansare (npansar AT us DOT ibm DOT com)
>> * Prithviraj Sen (senp AT us DOT ibm DOT com)
>> * Alexandre V Evfimievski (evfimi AT us DOT ibm DOT com)
>> * Fred Reiss (frreiss AT us DOT ibm DOT com)
>> * Deron Eriksson (deron AT us DOT ibm DOT com)
>> * Arvind Surve (asurve AT us DOT ibm DOT com)
>> * Mike Dusenberry (mwdusenb AT us DOT ibm DOT com)
>> * Reynold Xin   (rxin AT apache DOT org)
>> * Xiangrui Meng (meng AT apache DOT org)
>> * Joseph Bradley (jkbradley AT apache DOT org)
>> * Patrick Wendell (pwendell AT apache DOT org)
>> * Holden Karau (holden AT apache DOT org)
>> * DB Tsai (dbtsai AT apache DOT org)
>> 
>> == Affiliations ==
>> 
>> * DataBricks: Reynold Xin, Xiangrui Meng, Joseph Bradley, Patrick Wendell
>> * Netflix: DB Tsai
>> * IBM: Luciano Resende, Berthold Reinwald, Matthias Boehm, Shirish
>> Tatikonda, Niketan Pansare, Prithviraj Sen, Alexandre V Evfimievski, Fred
>> Reiss, Deron Eriksson, Arvind Surve, Mike Dusenberry and Holden Karau.
>> 
>> == Sponsors ==
>> 
>> === Champion ===
>> * Luciano Resende
>> 
>> === Nominated Mentors ===
>> * Luciano Resende
>> * Reynold Xin
>> * Patrick Wendell
>> * Rich Bowen
>> 
>> === Sponsoring Entity ===
>> We would like to propose the Apache Incubator to sponsor this project.
> Off course, my +1
> 
> -- 
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message