Mailing-List: contact general-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@incubator.apache.org
MIME-Version: 1.0
In-Reply-To: <CAFx9Rg7H5udyqmN+7sOxv-VB34+sSv06NMLd=vmoGnr=-yW-Tg@mail.gmail.com>
References: <CALGG8z2y4DnXaaLFiOd-g_E0LYe_1JWRHa8PQvNseV8v8-8F1w@mail.gmail.com>
 <CALGG8z2ScEf3hS9eW-7EnqLS6-QznfiTJQ6UD-7-wF4j6LpqYA@mail.gmail.com> <CAFx9Rg7H5udyqmN+7sOxv-VB34+sSv06NMLd=vmoGnr=-yW-Tg@mail.gmail.com>
From: Henri Yandell <bayard@apache.org>
Date: Sun, 15 Jan 2017 15:30:34 -0800
Message-ID: <CALGG8z0BidJ_icHObLG3uDoL333w-jFJyKVoKVdFqOAxVNcMsw@mail.gmail.com>
Subject: Re: [DISCUSS] Proposing MXNet for the Apache Incubator
To: general@incubator.apache.org
Content-Type: multipart/alternative; boundary=f403045c12b42e8ea705462a7433
archived-at: Sun, 15 Jan 2017 23:30:38 -0000

--f403045c12b42e8ea705462a7433
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Added. Apologies if Liang DePeng is the incorrect anglicization of your
name.

Hen

On Sat, Jan 14, 2017 at 12:08 AM, =E6=A2=81=E5=BE=B7=E6=BE=8E <liangdepeng@=
gmail.com> wrote:

> Hi,
>
> I=E2=80=99ve been working on the MXNet-ScalaPkg for a while with Yizhi Li=
u
> (@javelinjs).
> Please sign me up as a committer of MxNet.
>
> GitHub ID: Ldpe2G
> Email: liangdepeng@gmail.com
> Affiliations: Sun Yat-sen University
>
> 2017-01-14 13:49 GMT+08:00 Henri Yandell <bayard@apache.org>:
>
> > Thanks for all the feedback and interested parties :)
> >
> > My aim is to propose a vote on Monday, unless someone raises an issue
> > before then.
> >
> > Hen
> >
> > On Thu, Jan 5, 2017 at 9:12 PM, Henri Yandell <bayard@apache.org> wrote=
:
> >
> > > Hello Incubator,
> > >
> > > I'd like to propose a new incubator Apache MXNet podling.
> > >
> > > The existing MXNet project (http://mxnet.io - 1.5 years old, 15
> > > committers, 200 contributors) is very interested in joining Apache.
> MXNet
> > > is an open-source deep learning framework that allows you to define,
> > train,
> > > and deploy deep neural networks on a wide array of devices, from clou=
d
> > > infrastructure to mobile devices.
> > >
> > > The wiki proposal page is located here:
> > >
> > >   https://wiki.apache.org/incubator/MXNetProposal
> > >
> > > I've included the text below in case anyone wants to focus on parts o=
f
> it
> > > in a reply.
> > >
> > > Looking forward to your thoughts, and for lots of interested Apache
> > > members to volunteer to mentor the project in addition to Sebastian a=
nd
> > > myself.
> > >
> > > Currently the list of committers is based on the current active coder=
s,
> > so
> > > we're also very interested in hearing from anyone else who is
> interested
> > in
> > > working on the project, be they current or future contributor!
> > >
> > > Thanks,
> > >
> > > Hen
> > > On behalf of the MXNet project
> > >
> > > ---------
> > >
> > > =3D MXNet: Apache Incubator Proposal =3D
> > >
> > > =3D=3D Abstract =3D=3D
> > >
> > > MXNet is a Flexible and Efficient Library for Deep Learning
> > >
> > > =3D=3D Proposal =3D=3D
> > >
> > > MXNet is an open-source deep learning framework that allows you to
> > define,
> > > train, and deploy deep neural networks on a wide array of devices, fr=
om
> > > cloud infrastructure to mobile devices. It is highly scalable, allowi=
ng
> > for
> > > fast model training, and supports a flexible programming model and
> > multiple
> > > languages. MXNet allows you to mix symbolic and imperative programmin=
g
> > > flavors to maximize both efficiency and productivity. MXNet is built
> on a
> > > dynamic dependency scheduler that automatically parallelizes both
> > symbolic
> > > and imperative operations on the fly. A graph optimization layer on t=
op
> > of
> > > that makes symbolic execution fast and memory efficient. The MXNet
> > library
> > > is portable and lightweight, and it scales to multiple GPUs and
> multiple
> > > machines.
> > >
> > > =3D=3D Background =3D=3D
> > >
> > > Deep learning is a subset of Machine learning and refers to a class o=
f
> > > algorithms that use a hierarchical approach with non-linearities to
> > > discover and learn representations within data. Deep Learning has
> > recently
> > > become very popular due to its applicability and advancement of domai=
ns
> > > such as Computer Vision, Speech Recognition, Natural Language
> > Understanding
> > > and Recommender Systems. With pervasive and cost effective cloud
> > computing,
> > > large labeled datasets and continued algorithmic innovation, Deep
> > Learning
> > > has become the one of the most popular classes of algorithms for
> machine
> > > learning practitioners in recent years.
> > >
> > > =3D=3D Rational =3D=3D
> > >
> > > The adoption of deep learning is quickly expanding from initial deep
> > > domain experts rooted in academia to data scientists and developers
> > working
> > > to deploy intelligent services and products. Deep learning however ha=
s
> > many
> > > challenges.  These include model training time (which can take days t=
o
> > > weeks), programmability (not everyone writes Python or C++ and like
> > > symbolic programming) and balancing production readiness (support for
> > > things like failover) with development flexibility (ability to progra=
m
> > > different ways, support for new operators and model types) and speed =
of
> > > execution (fast and scalable model training).  Other frameworks excel
> on
> > > some but not all of these aspects.
> > >
> > >
> > > =3D=3D Initial Goals =3D=3D
> > >
> > > MXNet is a fairly established project on GitHub with its first code
> > > contribution in April 2015 and roughly 200 contributors. It is used b=
y
> > > several large companies and some of the top research institutions on
> the
> > > planet. Initial goals would be the following:
> > >
> > >  1. Move the existing codebase(s) to Apache
> > >  1. Integrate with the Apache development process/sign CLAs
> > >  1. Ensure all dependencies are compliant with Apache License version
> 2.0
> > >  1. Incremental development and releases per Apache guidelines
> > >  1. Establish engineering discipline and a predictable release cadenc=
e
> of
> > > high quality releases
> > >  1. Expand the community beyond the current base of expert level user=
s
> > >  1. Improve usability and the overall developer/user experience
> > >  1. Add additional functionality to address newer problem types and
> > > algorithms
> > >
> > >
> > > =3D=3D Current Status =3D=3D
> > >
> > > =3D=3D=3D Meritocracy =3D=3D=3D
> > >
> > > The MXNet project already operates on meritocratic principles. Today,
> > > MXNet has developers worldwide and has accepted multiple major patche=
s
> > from
> > > a diverse set of contributors within both industry and academia. We
> would
> > > like to follow ASF meritocratic principles to encourage more develope=
rs
> > to
> > > contribute in this project. We know that only active and committed
> > > developers from a diverse set of backgrounds can make MXNet a
> successful
> > > project.  We are also improving the documentation and code to help ne=
w
> > > developers get started quickly.
> > >
> > > =3D=3D=3D Community =3D=3D=3D
> > >
> > > Acceptance into the Apache foundation would bolster the growing user
> and
> > > developer community around MXNet. That community includes around 200
> > > contributors from academia and industry. The core developers of our
> > project
> > > are listed in our contributors below and are also represented by logo=
s
> on
> > > the mxnet.io site including Amazon, Baidu, Carnegie Mellon University=
,
> > > Turi, Intel, NYU, Nvidia, MIT, Microsoft, TuSimple, University of
> > Alberta,
> > > University of Washington and Wolfram.
> > >
> > > =3D=3D=3D Core Developers =3D=3D=3D
> > >
> > > (with GitHub logins)
> > >
> > >  * Tianqi Chen (@tqchen)
> > >  * Mu Li (@mli)
> > >  * Junyuan Xie (@piiswrong)
> > >  * Bing Xu (@antinucleon)
> > >  * Chiyuan Zhang (@pluskid)
> > >  * Minjie Wang (@jermainewang)
> > >  * Naiyan Wang (@winstywang)
> > >  * Yizhi Liu (@javelinjs)
> > >  * Tong He (@hetong007)
> > >  * Qiang Kou (@thirdwing)
> > >  * Xingjian Shi (@sxjscience)
> > >
> > > =3D=3D=3D Alignment =3D=3D=3D
> > >
> > > ASF is already the home of many distributed platforms, e.g., Hadoop,
> > Spark
> > > and Mahout, each of which targets a different application domain.
> MXNet,
> > > being a distributed platform for large-scale deep learning, focuses o=
n
> > > another important domain for which there still lacks a scalable,
> > > programmable, flexible and super fast open-source platform. The recen=
t
> > > success of deep learning models especially for vision and speech
> > > recognition tasks has generated interests in both applying existing
> deep
> > > learning models and in developing new ones. Thus, an open-source
> platform
> > > for deep learning backed by some of the top industry and academic
> players
> > > will be able to attract a large community of users and developers.
> MXNet
> > is
> > > a complex system needing many iterations of design, implementation an=
d
> > > testing. Apache's collaboration framework which encourages active
> > > contribution from developers will inevitably help improve the quality
> of
> > > the system, as shown in the success of Hadoop, Spark, etc. Equally
> > > important is the community of users which helps identify real-life
> > > applications of deep learning, and helps to evaluate the system's
> > > performance and ease-of-use. We hope to leverage ASF for coordinating
> and
> > > promoting both communities, and in return benefit the communities wit=
h
> > > another useful tool.
> > >
> > > =3D=3D Known Risks =3D=3D
> > >
> > > =3D=3D=3D Orphaned products =3D=3D=3D
> > >
> > > Given the current level of investment in MXNet and the stakeholders
> using
> > > it - the risk of the project being abandoned is minimal. Amazon, for
> > > example, is in active development to use MXNet in many of its service=
s
> > and
> > > many large corporations use it in their production applications.
> > >
> > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D
> > >
> > > MXNet has existed as a healthy open source project for more than a
> year.
> > > During that time, the project has attracted 200+ contributors.
> > >
> > > =3D=3D=3D Homogenous Developers =3D=3D=3D
> > >
> > > The initial list of committers and contributors includes developers
> from
> > > several institutions and industry participants (see above).
> > >
> > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D
> > >
> > > Like most open source projects, MXNet receives a substantial support
> from
> > > salaried developers. A large fraction of MXNet development is support=
ed
> > by
> > > graduate students at various universities in the course of research
> > degrees
> > > - this is more a =E2=80=9Cvolunteer=E2=80=9D relationship, since in m=
ost cases students
> > > contribute vastly more than is necessary to immediately support
> research.
> > > In addition, those working from within corporations are devoting
> > > significant time and effort in the project - and these come from
> several
> > > organizations.
> > >
> > > =3D=3D=3D A Excessive Fascination with the Apache Brand =3D=3D=3D
> > >
> > > We choose Apache not for publicity. We have two purposes. First, we
> hope
> > > that Apache's known best-practices for managing a mature open source
> > > project can help guide us.  For example, we are feeling the growing
> pains
> > > of a successful open source project as we attempt a major refactor of
> the
> > > internals while customers are using the system in production. We seek
> > > guidance in communicating breaking API changes and version revisions.
> > > Also, as our involvement from major corporations increases, we want t=
o
> > > assure our users that MXNet will stay open and not favor any particul=
ar
> > > platform or environment. These are some examples of the know-how and
> > > discipline we're hoping Apache can bring to our project.
> > >
> > > Second, we want to leverage Apache's reputation to recruit more
> > developers
> > > to create a diverse community.
> > >
> > > =3D=3D=3D Relationship with Other Apache Products =3D=3D=3D
> > >
> > > Apache Mahout and Apache Spark's MLlib are general machine learning
> > > systems. Deep learning algorithms can thus be implemented on these tw=
o
> > > platforms as well. However, in practice, the overlap will be minimal.
> > Deep
> > > learning is so computationally intensive that it often requires
> > specialized
> > > GPU hardware to accomplish tasks of meaningful size.  Making efficien=
t
> > use
> > > of GPU hardware is complex because the hardware is so fast that the
> > > supporting systems around it must be carefully optimized to keep the
> GPU
> > > cores busy.  Extending this capability to distributed multi-GPU and
> > > multi-host environments requires great care.  This is a critical
> > > differentiator between MXNet and existing Apache machine learning
> > systems.
> > >
> > > Mahout and Spark ML-LIB follow models where their nodes run
> > synchronously.
> > > This is the fundamental difference to MXNet who follows the parameter
> > > server framework. MXNet can run synchronously or asynchronously. In
> > > addition, MXNet has optimizations for training a wide range of deep
> > > learning models using a variety of approaches (e.g., model parallelis=
m
> > and
> > > data parallelism) which makes MXNet much more efficient (near-linear
> > > speedup on state of the art models). MXNet also supports both
> imperative
> > > and symbolic approaches providing ease of programming for deep learni=
ng
> > > algorithms.
> > >
> > > Other Apache projects that are potentially complimentary:
> > >
> > > Apache Arrow - read data in Apache Arrow=E2=80=98s internal format fr=
om MXNet,
> > > that would allow users to run ETL/preprocessing in Spark, save the
> > results
> > > in Arrow=E2=80=99s format and then run DL algorithms on it.
> > >
> > > Apache Singa - MXNet and Singa are both deep learning projects, and c=
an
> > > benefit from a larger deep learning community at Apache.
> > >
> > > =3D=3D Documentation =3D=3D
> > >
> > > Documentation has recently migrated to http://mxnet.io.  We continue
> to
> > > refine and improve the documentation.
> > >
> > > =3D=3D Initial Source =3D=3D
> > >
> > > We currently use Github to maintain our source code,
> > > https://github.com/MXNet
> > >
> > > =3D=3D Source and Intellectual Property Submission Plan =3D=3D
> > >
> > > MXNet Code is available under Apache License, Version 2.0. We will wo=
rk
> > > with the committers to get CLAs signed and review previous
> contributions.
> > >
> > > =3D=3D External Dependencies =3D=3D
> > >
> > >  * required by the core code base: GCC or CLOM, Clang, any BLAS libra=
ry
> > > (ATLAS, OpenBLAS, MKL), dmlc-core, mshadow, ps-lite (which requires
> > > lib-zeromq), TBB
> > >  * required for GPU usage: cudnn, cuda
> > >  * required for python usage: Python 2/3
> > >  * required for R module: R, Rcpp (GPLv2 licensing)
> > >  * optional for image preparation and preprocessing: opencv
> > >  * optional dependencies for additional features: torch7, numba, cyth=
on
> > > (in NNVM branch)
> > >
> > > Rcpt and lib-zeromq are expected to be licensing discussions.
> > >
> > > =3D=3D Cryptography =3D=3D
> > >
> > > Not Applicable
> > >
> > > =3D=3D Required Resources =3D=3D
> > >
> > > =3D=3D=3D Mailing Lists =3D=3D=3D
> > >
> > > There is currently no mailing list.
> > >
> > > =3D=3D=3D Issue Tracking =3D=3D=3D
> > >
> > > Currently uses GitHub to track issues. Would like to continue to do s=
o.
> > >
> > > =3D=3D Committers and Affiliations =3D=3D
> > >
> > >  * Tianqi Chen (UW)
> > >  * Mu Li (AWS)
> > >  * Junyuan Xie (AWS)
> > >  * Bing Xu (Apple)
> > >  * Chiyuan Zhang (MIT)
> > >  * Minjie Wang (UYU)
> > >  * Naiyan Wang (Tusimple)
> > >  * Yizhi Liu (Mediav)
> > >  * Tong He (Simon Fraser University)
> > >  * Qiang Kou (Indiana U)
> > >  * Xingjian Shi (HKUST)
> > >
> > > =3D=3D Sponsors =3D=3D
> > >
> > > =3D=3D=3D Champion =3D=3D=3D
> > >
> > > Henri Yandell (bayard at apache.org)
> > >
> > > =3D=3D=3D Nominated Mentors =3D=3D=3D
> > >
> > > Sebastian Schelter (ssc@apache.org)
> > >
> > >
> > > =3D=3D=3D Sponsoring Entity =3D=3D=3D
> > >
> > > We are requesting the Incubator to sponsor this project.
> > >
> > >
> >
>

--f403045c12b42e8ea705462a7433--