Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9199E200C0C for ; Mon, 16 Jan 2017 00:30:38 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 90143160B4F; Sun, 15 Jan 2017 23:30:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 68A5D160B32 for ; Mon, 16 Jan 2017 00:30:37 +0100 (CET) Received: (qmail 33771 invoked by uid 500); 15 Jan 2017 23:30:36 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 33759 invoked by uid 99); 15 Jan 2017 23:30:36 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Jan 2017 23:30:36 +0000 Received: from mail-it0-f45.google.com (mail-it0-f45.google.com [209.85.214.45]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id E46781A06AA for ; Sun, 15 Jan 2017 23:30:35 +0000 (UTC) Received: by mail-it0-f45.google.com with SMTP id c7so65083251itd.1 for ; Sun, 15 Jan 2017 15:30:35 -0800 (PST) X-Gm-Message-State: AIkVDXK+xWzmv4LB5YI2v7PkGYR6J2bfxF8kupCQ2/v8xhOXMLaWxvvm8aKcD++lDMFew/iol3PHOCLytiJbcw== X-Received: by 10.36.238.133 with SMTP id b127mr12789245iti.20.1484523035329; Sun, 15 Jan 2017 15:30:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.33.198 with HTTP; Sun, 15 Jan 2017 15:30:34 -0800 (PST) In-Reply-To: References: From: Henri Yandell Date: Sun, 15 Jan 2017 15:30:34 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [DISCUSS] Proposing MXNet for the Apache Incubator To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=f403045c12b42e8ea705462a7433 archived-at: Sun, 15 Jan 2017 23:30:38 -0000 --f403045c12b42e8ea705462a7433 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Added. Apologies if Liang DePeng is the incorrect anglicization of your name. Hen On Sat, Jan 14, 2017 at 12:08 AM, =E6=A2=81=E5=BE=B7=E6=BE=8E wrote: > Hi, > > I=E2=80=99ve been working on the MXNet-ScalaPkg for a while with Yizhi Li= u > (@javelinjs). > Please sign me up as a committer of MxNet. > > GitHub ID: Ldpe2G > Email: liangdepeng@gmail.com > Affiliations: Sun Yat-sen University > > 2017-01-14 13:49 GMT+08:00 Henri Yandell : > > > Thanks for all the feedback and interested parties :) > > > > My aim is to propose a vote on Monday, unless someone raises an issue > > before then. > > > > Hen > > > > On Thu, Jan 5, 2017 at 9:12 PM, Henri Yandell wrote= : > > > > > Hello Incubator, > > > > > > I'd like to propose a new incubator Apache MXNet podling. > > > > > > The existing MXNet project (http://mxnet.io - 1.5 years old, 15 > > > committers, 200 contributors) is very interested in joining Apache. > MXNet > > > is an open-source deep learning framework that allows you to define, > > train, > > > and deploy deep neural networks on a wide array of devices, from clou= d > > > infrastructure to mobile devices. > > > > > > The wiki proposal page is located here: > > > > > > https://wiki.apache.org/incubator/MXNetProposal > > > > > > I've included the text below in case anyone wants to focus on parts o= f > it > > > in a reply. > > > > > > Looking forward to your thoughts, and for lots of interested Apache > > > members to volunteer to mentor the project in addition to Sebastian a= nd > > > myself. > > > > > > Currently the list of committers is based on the current active coder= s, > > so > > > we're also very interested in hearing from anyone else who is > interested > > in > > > working on the project, be they current or future contributor! > > > > > > Thanks, > > > > > > Hen > > > On behalf of the MXNet project > > > > > > --------- > > > > > > =3D MXNet: Apache Incubator Proposal =3D > > > > > > =3D=3D Abstract =3D=3D > > > > > > MXNet is a Flexible and Efficient Library for Deep Learning > > > > > > =3D=3D Proposal =3D=3D > > > > > > MXNet is an open-source deep learning framework that allows you to > > define, > > > train, and deploy deep neural networks on a wide array of devices, fr= om > > > cloud infrastructure to mobile devices. It is highly scalable, allowi= ng > > for > > > fast model training, and supports a flexible programming model and > > multiple > > > languages. MXNet allows you to mix symbolic and imperative programmin= g > > > flavors to maximize both efficiency and productivity. MXNet is built > on a > > > dynamic dependency scheduler that automatically parallelizes both > > symbolic > > > and imperative operations on the fly. A graph optimization layer on t= op > > of > > > that makes symbolic execution fast and memory efficient. The MXNet > > library > > > is portable and lightweight, and it scales to multiple GPUs and > multiple > > > machines. > > > > > > =3D=3D Background =3D=3D > > > > > > Deep learning is a subset of Machine learning and refers to a class o= f > > > algorithms that use a hierarchical approach with non-linearities to > > > discover and learn representations within data. Deep Learning has > > recently > > > become very popular due to its applicability and advancement of domai= ns > > > such as Computer Vision, Speech Recognition, Natural Language > > Understanding > > > and Recommender Systems. With pervasive and cost effective cloud > > computing, > > > large labeled datasets and continued algorithmic innovation, Deep > > Learning > > > has become the one of the most popular classes of algorithms for > machine > > > learning practitioners in recent years. > > > > > > =3D=3D Rational =3D=3D > > > > > > The adoption of deep learning is quickly expanding from initial deep > > > domain experts rooted in academia to data scientists and developers > > working > > > to deploy intelligent services and products. Deep learning however ha= s > > many > > > challenges. These include model training time (which can take days t= o > > > weeks), programmability (not everyone writes Python or C++ and like > > > symbolic programming) and balancing production readiness (support for > > > things like failover) with development flexibility (ability to progra= m > > > different ways, support for new operators and model types) and speed = of > > > execution (fast and scalable model training). Other frameworks excel > on > > > some but not all of these aspects. > > > > > > > > > =3D=3D Initial Goals =3D=3D > > > > > > MXNet is a fairly established project on GitHub with its first code > > > contribution in April 2015 and roughly 200 contributors. It is used b= y > > > several large companies and some of the top research institutions on > the > > > planet. Initial goals would be the following: > > > > > > 1. Move the existing codebase(s) to Apache > > > 1. Integrate with the Apache development process/sign CLAs > > > 1. Ensure all dependencies are compliant with Apache License version > 2.0 > > > 1. Incremental development and releases per Apache guidelines > > > 1. Establish engineering discipline and a predictable release cadenc= e > of > > > high quality releases > > > 1. Expand the community beyond the current base of expert level user= s > > > 1. Improve usability and the overall developer/user experience > > > 1. Add additional functionality to address newer problem types and > > > algorithms > > > > > > > > > =3D=3D Current Status =3D=3D > > > > > > =3D=3D=3D Meritocracy =3D=3D=3D > > > > > > The MXNet project already operates on meritocratic principles. Today, > > > MXNet has developers worldwide and has accepted multiple major patche= s > > from > > > a diverse set of contributors within both industry and academia. We > would > > > like to follow ASF meritocratic principles to encourage more develope= rs > > to > > > contribute in this project. We know that only active and committed > > > developers from a diverse set of backgrounds can make MXNet a > successful > > > project. We are also improving the documentation and code to help ne= w > > > developers get started quickly. > > > > > > =3D=3D=3D Community =3D=3D=3D > > > > > > Acceptance into the Apache foundation would bolster the growing user > and > > > developer community around MXNet. That community includes around 200 > > > contributors from academia and industry. The core developers of our > > project > > > are listed in our contributors below and are also represented by logo= s > on > > > the mxnet.io site including Amazon, Baidu, Carnegie Mellon University= , > > > Turi, Intel, NYU, Nvidia, MIT, Microsoft, TuSimple, University of > > Alberta, > > > University of Washington and Wolfram. > > > > > > =3D=3D=3D Core Developers =3D=3D=3D > > > > > > (with GitHub logins) > > > > > > * Tianqi Chen (@tqchen) > > > * Mu Li (@mli) > > > * Junyuan Xie (@piiswrong) > > > * Bing Xu (@antinucleon) > > > * Chiyuan Zhang (@pluskid) > > > * Minjie Wang (@jermainewang) > > > * Naiyan Wang (@winstywang) > > > * Yizhi Liu (@javelinjs) > > > * Tong He (@hetong007) > > > * Qiang Kou (@thirdwing) > > > * Xingjian Shi (@sxjscience) > > > > > > =3D=3D=3D Alignment =3D=3D=3D > > > > > > ASF is already the home of many distributed platforms, e.g., Hadoop, > > Spark > > > and Mahout, each of which targets a different application domain. > MXNet, > > > being a distributed platform for large-scale deep learning, focuses o= n > > > another important domain for which there still lacks a scalable, > > > programmable, flexible and super fast open-source platform. The recen= t > > > success of deep learning models especially for vision and speech > > > recognition tasks has generated interests in both applying existing > deep > > > learning models and in developing new ones. Thus, an open-source > platform > > > for deep learning backed by some of the top industry and academic > players > > > will be able to attract a large community of users and developers. > MXNet > > is > > > a complex system needing many iterations of design, implementation an= d > > > testing. Apache's collaboration framework which encourages active > > > contribution from developers will inevitably help improve the quality > of > > > the system, as shown in the success of Hadoop, Spark, etc. Equally > > > important is the community of users which helps identify real-life > > > applications of deep learning, and helps to evaluate the system's > > > performance and ease-of-use. We hope to leverage ASF for coordinating > and > > > promoting both communities, and in return benefit the communities wit= h > > > another useful tool. > > > > > > =3D=3D Known Risks =3D=3D > > > > > > =3D=3D=3D Orphaned products =3D=3D=3D > > > > > > Given the current level of investment in MXNet and the stakeholders > using > > > it - the risk of the project being abandoned is minimal. Amazon, for > > > example, is in active development to use MXNet in many of its service= s > > and > > > many large corporations use it in their production applications. > > > > > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > > > > > > MXNet has existed as a healthy open source project for more than a > year. > > > During that time, the project has attracted 200+ contributors. > > > > > > =3D=3D=3D Homogenous Developers =3D=3D=3D > > > > > > The initial list of committers and contributors includes developers > from > > > several institutions and industry participants (see above). > > > > > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > > > > > > Like most open source projects, MXNet receives a substantial support > from > > > salaried developers. A large fraction of MXNet development is support= ed > > by > > > graduate students at various universities in the course of research > > degrees > > > - this is more a =E2=80=9Cvolunteer=E2=80=9D relationship, since in m= ost cases students > > > contribute vastly more than is necessary to immediately support > research. > > > In addition, those working from within corporations are devoting > > > significant time and effort in the project - and these come from > several > > > organizations. > > > > > > =3D=3D=3D A Excessive Fascination with the Apache Brand =3D=3D=3D > > > > > > We choose Apache not for publicity. We have two purposes. First, we > hope > > > that Apache's known best-practices for managing a mature open source > > > project can help guide us. For example, we are feeling the growing > pains > > > of a successful open source project as we attempt a major refactor of > the > > > internals while customers are using the system in production. We seek > > > guidance in communicating breaking API changes and version revisions. > > > Also, as our involvement from major corporations increases, we want t= o > > > assure our users that MXNet will stay open and not favor any particul= ar > > > platform or environment. These are some examples of the know-how and > > > discipline we're hoping Apache can bring to our project. > > > > > > Second, we want to leverage Apache's reputation to recruit more > > developers > > > to create a diverse community. > > > > > > =3D=3D=3D Relationship with Other Apache Products =3D=3D=3D > > > > > > Apache Mahout and Apache Spark's MLlib are general machine learning > > > systems. Deep learning algorithms can thus be implemented on these tw= o > > > platforms as well. However, in practice, the overlap will be minimal. > > Deep > > > learning is so computationally intensive that it often requires > > specialized > > > GPU hardware to accomplish tasks of meaningful size. Making efficien= t > > use > > > of GPU hardware is complex because the hardware is so fast that the > > > supporting systems around it must be carefully optimized to keep the > GPU > > > cores busy. Extending this capability to distributed multi-GPU and > > > multi-host environments requires great care. This is a critical > > > differentiator between MXNet and existing Apache machine learning > > systems. > > > > > > Mahout and Spark ML-LIB follow models where their nodes run > > synchronously. > > > This is the fundamental difference to MXNet who follows the parameter > > > server framework. MXNet can run synchronously or asynchronously. In > > > addition, MXNet has optimizations for training a wide range of deep > > > learning models using a variety of approaches (e.g., model parallelis= m > > and > > > data parallelism) which makes MXNet much more efficient (near-linear > > > speedup on state of the art models). MXNet also supports both > imperative > > > and symbolic approaches providing ease of programming for deep learni= ng > > > algorithms. > > > > > > Other Apache projects that are potentially complimentary: > > > > > > Apache Arrow - read data in Apache Arrow=E2=80=98s internal format fr= om MXNet, > > > that would allow users to run ETL/preprocessing in Spark, save the > > results > > > in Arrow=E2=80=99s format and then run DL algorithms on it. > > > > > > Apache Singa - MXNet and Singa are both deep learning projects, and c= an > > > benefit from a larger deep learning community at Apache. > > > > > > =3D=3D Documentation =3D=3D > > > > > > Documentation has recently migrated to http://mxnet.io. We continue > to > > > refine and improve the documentation. > > > > > > =3D=3D Initial Source =3D=3D > > > > > > We currently use Github to maintain our source code, > > > https://github.com/MXNet > > > > > > =3D=3D Source and Intellectual Property Submission Plan =3D=3D > > > > > > MXNet Code is available under Apache License, Version 2.0. We will wo= rk > > > with the committers to get CLAs signed and review previous > contributions. > > > > > > =3D=3D External Dependencies =3D=3D > > > > > > * required by the core code base: GCC or CLOM, Clang, any BLAS libra= ry > > > (ATLAS, OpenBLAS, MKL), dmlc-core, mshadow, ps-lite (which requires > > > lib-zeromq), TBB > > > * required for GPU usage: cudnn, cuda > > > * required for python usage: Python 2/3 > > > * required for R module: R, Rcpp (GPLv2 licensing) > > > * optional for image preparation and preprocessing: opencv > > > * optional dependencies for additional features: torch7, numba, cyth= on > > > (in NNVM branch) > > > > > > Rcpt and lib-zeromq are expected to be licensing discussions. > > > > > > =3D=3D Cryptography =3D=3D > > > > > > Not Applicable > > > > > > =3D=3D Required Resources =3D=3D > > > > > > =3D=3D=3D Mailing Lists =3D=3D=3D > > > > > > There is currently no mailing list. > > > > > > =3D=3D=3D Issue Tracking =3D=3D=3D > > > > > > Currently uses GitHub to track issues. Would like to continue to do s= o. > > > > > > =3D=3D Committers and Affiliations =3D=3D > > > > > > * Tianqi Chen (UW) > > > * Mu Li (AWS) > > > * Junyuan Xie (AWS) > > > * Bing Xu (Apple) > > > * Chiyuan Zhang (MIT) > > > * Minjie Wang (UYU) > > > * Naiyan Wang (Tusimple) > > > * Yizhi Liu (Mediav) > > > * Tong He (Simon Fraser University) > > > * Qiang Kou (Indiana U) > > > * Xingjian Shi (HKUST) > > > > > > =3D=3D Sponsors =3D=3D > > > > > > =3D=3D=3D Champion =3D=3D=3D > > > > > > Henri Yandell (bayard at apache.org) > > > > > > =3D=3D=3D Nominated Mentors =3D=3D=3D > > > > > > Sebastian Schelter (ssc@apache.org) > > > > > > > > > =3D=3D=3D Sponsoring Entity =3D=3D=3D > > > > > > We are requesting the Incubator to sponsor this project. > > > > > > > > > --f403045c12b42e8ea705462a7433--