Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 18D2F200BFD for ; Sun, 15 Jan 2017 21:35:25 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 17442160B32; Sun, 15 Jan 2017 20:35:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 15EFB160B2B for ; Sun, 15 Jan 2017 21:35:23 +0100 (CET) Received: (qmail 5523 invoked by uid 500); 15 Jan 2017 20:35:22 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Delivered-To: moderator for general@incubator.apache.org Received: (qmail 32925 invoked by uid 99); 15 Jan 2017 19:23:26 -0000 X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.424 X-Spam-Level: ** X-Spam-Status: No, score=2.424 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_ENVFROM_END_DIGIT=0.25, FROM_MISSPACED=0.001, NML_ADSP_CUSTOM_MED=1.2, SPF_SOFTFAIL=0.972] autolearn=disabled MIME-Version: 1.0 Message-ID: Subject: Re: [DISCUSS] Proposing MXNet for the Apache Incubator References: From: "sandeep krishnamurthy" In-Reply-To: Content-Type: text/plain; charset="iso-8859-1" x-ponymail-sender: c5df25a535e1eb59e7f80e7ac16f42f14be573ba Date: Sun, 15 Jan 2017 19:23:13 -0000 x-ponymail-agent: PonyMail Composer/0.2 To: X-Mailer: LuaSocket 3.0-rc1 archived-at: Sun, 15 Jan 2017 20:35:25 -0000 On 2017-01-05 21:12 (-0800), Henri Yandell wrote: > Hello Incubator, > > I'd like to propose a new incubator Apache MXNet podling. > > The existing MXNet project (http://mxnet.io - 1.5 years old, 15 committers, > 200 contributors) is very interested in joining Apache. MXNet is an > open-source deep learning framework that allows you to define, train, and > deploy deep neural networks on a wide array of devices, from cloud > infrastructure to mobile devices. > > The wiki proposal page is located here: > > https://wiki.apache.org/incubator/MXNetProposal > > I've included the text below in case anyone wants to focus on parts of it > in a reply. > > Looking forward to your thoughts, and for lots of interested Apache members > to volunteer to mentor the project in addition to Sebastian and myself. > > Currently the list of committers is based on the current active coders, so > we're also very interested in hearing from anyone else who is interested in > working on the project, be they current or future contributor! > > Thanks, > > Hen > On behalf of the MXNet project > > --------- > > = MXNet: Apache Incubator Proposal = > > == Abstract == > > MXNet is a Flexible and Efficient Library for Deep Learning > > == Proposal == > > MXNet is an open-source deep learning framework that allows you to define, > train, and deploy deep neural networks on a wide array of devices, from > cloud infrastructure to mobile devices. It is highly scalable, allowing for > fast model training, and supports a flexible programming model and multiple > languages. MXNet allows you to mix symbolic and imperative programming > flavors to maximize both efficiency and productivity. MXNet is built on a > dynamic dependency scheduler that automatically parallelizes both symbolic > and imperative operations on the fly. A graph optimization layer on top of > that makes symbolic execution fast and memory efficient. The MXNet library > is portable and lightweight, and it scales to multiple GPUs and multiple > machines. > > == Background == > > Deep learning is a subset of Machine learning and refers to a class of > algorithms that use a hierarchical approach with non-linearities to > discover and learn representations within data. Deep Learning has recently > become very popular due to its applicability and advancement of domains > such as Computer Vision, Speech Recognition, Natural Language Understanding > and Recommender Systems. With pervasive and cost effective cloud computing, > large labeled datasets and continued algorithmic innovation, Deep Learning > has become the one of the most popular classes of algorithms for machine > learning practitioners in recent years. > > == Rational == > > The adoption of deep learning is quickly expanding from initial deep domain > experts rooted in academia to data scientists and developers working to > deploy intelligent services and products. Deep learning however has many > challenges. These include model training time (which can take days to > weeks), programmability (not everyone writes Python or C++ and like > symbolic programming) and balancing production readiness (support for > things like failover) with development flexibility (ability to program > different ways, support for new operators and model types) and speed of > execution (fast and scalable model training). Other frameworks excel on > some but not all of these aspects. > > > == Initial Goals == > > MXNet is a fairly established project on GitHub with its first code > contribution in April 2015 and roughly 200 contributors. It is used by > several large companies and some of the top research institutions on the > planet. Initial goals would be the following: > > 1. Move the existing codebase(s) to Apache > 1. Integrate with the Apache development process/sign CLAs > 1. Ensure all dependencies are compliant with Apache License version 2.0 > 1. Incremental development and releases per Apache guidelines > 1. Establish engineering discipline and a predictable release cadence of > high quality releases > 1. Expand the community beyond the current base of expert level users > 1. Improve usability and the overall developer/user experience > 1. Add additional functionality to address newer problem types and > algorithms > > > == Current Status == > > === Meritocracy === > > The MXNet project already operates on meritocratic principles. Today, MXNet > has developers worldwide and has accepted multiple major patches from a > diverse set of contributors within both industry and academia. We would > like to follow ASF meritocratic principles to encourage more developers to > contribute in this project. We know that only active and committed > developers from a diverse set of backgrounds can make MXNet a successful > project. We are also improving the documentation and code to help new > developers get started quickly. > > === Community === > > Acceptance into the Apache foundation would bolster the growing user and > developer community around MXNet. That community includes around 200 > contributors from academia and industry. The core developers of our project > are listed in our contributors below and are also represented by logos on > the mxnet.io site including Amazon, Baidu, Carnegie Mellon University, > Turi, Intel, NYU, Nvidia, MIT, Microsoft, TuSimple, University of Alberta, > University of Washington and Wolfram. > > === Core Developers === > > (with GitHub logins) > > * Tianqi Chen (@tqchen) > * Mu Li (@mli) > * Junyuan Xie (@piiswrong) > * Bing Xu (@antinucleon) > * Chiyuan Zhang (@pluskid) > * Minjie Wang (@jermainewang) > * Naiyan Wang (@winstywang) > * Yizhi Liu (@javelinjs) > * Tong He (@hetong007) > * Qiang Kou (@thirdwing) > * Xingjian Shi (@sxjscience) > > === Alignment === > > ASF is already the home of many distributed platforms, e.g., Hadoop, Spark > and Mahout, each of which targets a different application domain. MXNet, > being a distributed platform for large-scale deep learning, focuses on > another important domain for which there still lacks a scalable, > programmable, flexible and super fast open-source platform. The recent > success of deep learning models especially for vision and speech > recognition tasks has generated interests in both applying existing deep > learning models and in developing new ones. Thus, an open-source platform > for deep learning backed by some of the top industry and academic players > will be able to attract a large community of users and developers. MXNet is > a complex system needing many iterations of design, implementation and > testing. Apache's collaboration framework which encourages active > contribution from developers will inevitably help improve the quality of > the system, as shown in the success of Hadoop, Spark, etc. Equally > important is the community of users which helps identify real-life > applications of deep learning, and helps to evaluate the system's > performance and ease-of-use. We hope to leverage ASF for coordinating and > promoting both communities, and in return benefit the communities with > another useful tool. > > == Known Risks == > > === Orphaned products === > > Given the current level of investment in MXNet and the stakeholders using > it - the risk of the project being abandoned is minimal. Amazon, for > example, is in active development to use MXNet in many of its services and > many large corporations use it in their production applications. > > === Inexperience with Open Source === > > MXNet has existed as a healthy open source project for more than a year. > During that time, the project has attracted 200+ contributors. > > === Homogenous Developers === > > The initial list of committers and contributors includes developers from > several institutions and industry participants (see above). > > === Reliance on Salaried Developers === > > Like most open source projects, MXNet receives a substantial support from > salaried developers. A large fraction of MXNet development is supported by > graduate students at various universities in the course of research degrees > - this is more a “volunteer” relationship, since in most cases students > contribute vastly more than is necessary to immediately support research. > In addition, those working from within corporations are devoting > significant time and effort in the project - and these come from several > organizations. > > === A Excessive Fascination with the Apache Brand === > > We choose Apache not for publicity. We have two purposes. First, we hope > that Apache's known best-practices for managing a mature open source > project can help guide us. For example, we are feeling the growing pains > of a successful open source project as we attempt a major refactor of the > internals while customers are using the system in production. We seek > guidance in communicating breaking API changes and version revisions. > Also, as our involvement from major corporations increases, we want to > assure our users that MXNet will stay open and not favor any particular > platform or environment. These are some examples of the know-how and > discipline we're hoping Apache can bring to our project. > > Second, we want to leverage Apache's reputation to recruit more developers > to create a diverse community. > > === Relationship with Other Apache Products === > > Apache Mahout and Apache Spark's MLlib are general machine learning > systems. Deep learning algorithms can thus be implemented on these two > platforms as well. However, in practice, the overlap will be minimal. Deep > learning is so computationally intensive that it often requires specialized > GPU hardware to accomplish tasks of meaningful size. Making efficient use > of GPU hardware is complex because the hardware is so fast that the > supporting systems around it must be carefully optimized to keep the GPU > cores busy. Extending this capability to distributed multi-GPU and > multi-host environments requires great care. This is a critical > differentiator between MXNet and existing Apache machine learning systems. > > Mahout and Spark ML-LIB follow models where their nodes run synchronously. > This is the fundamental difference to MXNet who follows the parameter > server framework. MXNet can run synchronously or asynchronously. In > addition, MXNet has optimizations for training a wide range of deep > learning models using a variety of approaches (e.g., model parallelism and > data parallelism) which makes MXNet much more efficient (near-linear > speedup on state of the art models). MXNet also supports both imperative > and symbolic approaches providing ease of programming for deep learning > algorithms. > > Other Apache projects that are potentially complimentary: > > Apache Arrow - read data in Apache Arrow‘s internal format from MXNet, that > would allow users to run ETL/preprocessing in Spark, save the results in > Arrow’s format and then run DL algorithms on it. > > Apache Singa - MXNet and Singa are both deep learning projects, and can > benefit from a larger deep learning community at Apache. > > == Documentation == > > Documentation has recently migrated to http://mxnet.io. We continue to > refine and improve the documentation. > > == Initial Source == > > We currently use Github to maintain our source code, > https://github.com/MXNet > > == Source and Intellectual Property Submission Plan == > > MXNet Code is available under Apache License, Version 2.0. We will work > with the committers to get CLAs signed and review previous contributions. > > == External Dependencies == > > * required by the core code base: GCC or CLOM, Clang, any BLAS library > (ATLAS, OpenBLAS, MKL), dmlc-core, mshadow, ps-lite (which requires > lib-zeromq), TBB > * required for GPU usage: cudnn, cuda > * required for python usage: Python 2/3 > * required for R module: R, Rcpp (GPLv2 licensing) > * optional for image preparation and preprocessing: opencv > * optional dependencies for additional features: torch7, numba, cython (in > NNVM branch) > > Rcpt and lib-zeromq are expected to be licensing discussions. > > == Cryptography == > > Not Applicable > > == Required Resources == > > === Mailing Lists === > > There is currently no mailing list. > > === Issue Tracking === > > Currently uses GitHub to track issues. Would like to continue to do so. > > == Committers and Affiliations == > > * Tianqi Chen (UW) > * Mu Li (AWS) > * Junyuan Xie (AWS) > * Bing Xu (Apple) > * Chiyuan Zhang (MIT) > * Minjie Wang (UYU) > * Naiyan Wang (Tusimple) > * Yizhi Liu (Mediav) > * Tong He (Simon Fraser University) > * Qiang Kou (Indiana U) > * Xingjian Shi (HKUST) > > == Sponsors == > > === Champion === > > Henri Yandell (bayard at apache.org) > > === Nominated Mentors === > > Sebastian Schelter (ssc@apache.org) > > > === Sponsoring Entity === > > We are requesting the Incubator to sponsor this project. > Hello, I am committing actively for all mxnet.io documentation changes. I am working with Mu at work (Amazon) and would like to be involved more with the project. Please add me as a committer. Github ID: sandeep-krishnamurthy Gmail: sandeep.krishna98@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org