incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "SingaProposal" by ThejasNair
Date Thu, 18 Dec 2014 05:34:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "SingaProposal" page has been changed by ThejasNair:
https://wiki.apache.org/incubator/SingaProposal

New page:
= Singa Incubator Proposal =
== Abstract ==
SINGA is a distributed deep learning platform.

== Proposal ==
SINGA is an efficient, scalable and easy-to-use distributed platform
for training deep learning models, e.g., Deep Convolutional Neural Network and
Deep Belief Network. It parallelizes the computation (i.e., training) onto a
cluster of nodes by distributing the training data and model automatically to
speed up the training. Built-in training algorithms like Back-Propagation and
Contrastive Divergence are implemented based on common abstractions of deep
learning models. Users can train their own deep learning models by simply
customizing these abstractions like implementing the Mapper and Reducer in Hadoop.

== Background ==
Deep learning refers to a set of feature (or representation) learning models
that consist of multiple (non-linear) layers, where different layers learn
different levels of abstractions (representations) of the raw input data.
Larger (in terms of model parameters) and deeper (in terms of number of layers)
models have shown better performance, e.g., lower image classification error in
Large Scale Visual Recognition Challenge. However, a larger model requires more
memory and larger training data to reduce over-fitting. Complex numeric operations
make the training computation intensive. In practice, training large deep learning
models takes weeks or months on a single node (even with GPU).

== Rational ==
Deep learning has gained a lot of attraction in both academia and industry due to
its success in a wide range of areas such as computer vision and speech recognition.
However, training of such models is computationally expensive, especially for large
and deep models (e.g., with billions of parameters and more than 10 layers). Both
Google and Microsoft have developed distributed deep learning systems to make the
training more efficient by distributing the computations within a cluster of nodes.
However, these systems are closed source softwares. Our goal is to leverage the
community of open source developers to make SINGA efficient, scalable and easy to
use. SINGA is a full fledged distributed platform, that could benefit the
community and also benefit from the community in their involvement in contributing
to the further work in this area. We believe the nature of SINGA and our visions
for the system fit naturally to Apache's philosophy and development framework.

== Initial Goals ==
We have developed a system for SINGA running on a commodity computer
cluster. The initial goals include,
* improving the system in terms of scalability and efficiency, e.g., using
Infiniband for network communication and multi-threading for one node computation.
We would consider extending SINGA to GPU clusters later.
* benchmarking with larger datasets (hundreds of millions of training instances)
and models (billions of parameters).
* adding more built-in deep learning models. Users can train the built-in models
on their datasets directly.


== Current Status ==
=== Meritocracy ===
We would like to follow ASF meritocratic principles to encourage more developers
to contribute in this project. We know that only active and excellent developers
can make SINGA a successful project. The committer list and PMC will be updated
based on developers' performance and commitment. We are also improving the
documentation and code to help new developers get started quickly.

=== Community ===
SINGA is currently being developed in the Database System Research Lab at the
National University of Singapore (NUS) in collaboration with Zhejiang University in China.
Our lab has extensive experience in building database related systems, including
distributed systems. Six PhD students and research assistants (Jinyang Gao,
Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian Lee Tan)
have been working for a year on this project. We are open to recruiting more
developers from diverse backgrounds.

=== Core Developers ===
Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked on
distributed systems for more than 20 years. They have collaborated with the
industry and have built various large scale systems. Anh Dinh's research is also
on distributed systems, albeit with more focus on security aspects. Wei Wang's
research is on deep learning problems including deep learning applications and
large scale training. Sheng Wang and Jinyang are working on efficient indexing,
querying of large scale data and machine learning. Kaiping, Zhaojing and Zhongle 
are new PhD students who jointed SINGA recently. They will work on this project 
for a longer time (next 4-5 years). While we share common research interests,
each member also brings diverse expertise to the team.

=== Alignment ===
ASF is already the home of many distributed platforms, e.g., Hadoop, Spark and
Mahout, each of which targets a different application domain. SINGA, being a
distributed platform for large-scale deep learning, focuses on another important
domain for which there still lacks a robust and scalable open-source platform.
The recent success of deep learning models especially for vision and speech
recognition tasks has generated interests in both applying existing deep learning
models and in developing new ones. Thus, an open-source platform for deep
learning will be able to attract a large community of users and developers.
SINGA is a complex system needing many iterations of design, implementation and
testing. Apache's collaboration framework which encourages active contribution
from developers will inevitably help improve the quality of the system, as shown
in the success of Hadoop, Spark, etc.. Equally important is the community of
users which helps identify real-life applications of deep learning, and helps
to evaluate the system's performance and ease-of-use. We hope to leverage ASF for
coordinating and promoting both communities, and in return benefit the communities
with another useful tool.

== Known Risks ==
=== Orphaned products ===
Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave the
lab in two to four years time. It is possible that some of them may not have enough
time to focus on this project after that. But, SINGA is part of our other bigger
research projects on building an infrastructure for data intensive applications,
which include health-care analytics and brain-inspired computing. Beng Chin and
Kian Lee would continue working on it and getting more people involved. For example,
three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
Individual developers are welcome to make SINGA a diverse community
that is robust and independent from any single developer.

=== Inexperience with Open Source ===
All the developers are active users and followers of open source projects. Our
research lab has a strong commitment to open source, and has released the source
code of several systems under open source license as a way of contributing back
to the open source community. But we do not have much real experience in open source
projects with large and well organized communities like those in Apache. This is
one reason we choose Apache which is experienced in open source project incubation.
We hope to get the help from Apache (e.g., champion and mentors) to establish a 
healthy path for SINGA.

=== Homogenous Developers ===
Although the current developers are researchers in the universities, they have
different research interests and project experiences, as mentioned in the section
that introduces the core developers. We know that a diverse community is helpful.
Hence we are open to the idea of recruiting developers from other regions and organizations.

=== Reliance on Salaried Developers ===
As a research project in the university, SINGA's current developing community
consists of professors, PhD students, research assistants and postdoctoral fellows.
They are driven by their interests to work on this project and have contributed
actively since the start of the project. The research assistants and fellows are
expected to leave when their contracts expire. However, they are keen to continue
to work on the project voluntarily. Moreover, as a long term research project, new
research assistants and fellows are likely to join the project.

=== A Excessive Fascination with the Apache Brand ===
We choose Apache not for publicity. We have two purposes. First, we want to
leverage Apache's reputation to recruit more developers to make a diverse
community. Second, we hope that Apache can help us to establish a healthy path
in developing SINGA. Beng Chin and Kian-Lee are established database and
distributed system researchers, and together with the other contributors, they
sincerely believe that there is a need for a widely accepted open source
distributed deep learning platform. The field of deep learning is still at its
infancy, and an open source platform will fuel the research in the area. Moreover,
such a platform will enable researchers to develop new models  and algorithms,
rather than spending time implementing a deep learning system from scratch.
Furthermore, the need for scalability for such a platform is obvious.

=== Relationship with Other Apache Products ===
Apache H2O implemented two simple deep learning models, namely the Multi-Layer
Perceptron and Deep Auto-encoders. There are two significant differences between
H2O and SINGA. First, H2O adopts the Map-Reduce framework which runs a set of
computing nodes in parallel againsts of the training set. Model parameters
trained by all computing nodes are averaged as the final model parameters. This
training algorithm is different from the distributed training algorithm used by
DistBelief, Adam and SINGA, which frequently synchronizes the parameters trained
from different nodes. SINGA adopts the parameter server framework to support a wide
range of distributed training algorithms and parallelization methods (e.g., data
parallelism, model parallelism and hybrid parallelism. H2O only support data
parallelism) . Second, in H2O, users are restricted to use the two built-in models.
In SINGA, we provide simple programming model to let users implement their own
deep learning models. A new deep learning model can be implemented by customizing
the base Layer class for each layer involved in the model. It is similar to
writing Hadoop programs where users only need to override the base Mapper and
Reducer. We also provide built-in models for users to use directly. 

== Documentation ==
The project is hosted at http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
Documentations can be found at the Github Wiki Page: https://github.com/nudles/singa/wiki.
We continue to refine and improve the documentation.

== Initial Source ==
We use Github to maintain our source code, https://github.com/nudles/singa

== Source and Intellectual Property Submission Plan ==
We plan to make our code base be under Apache License, Version 2.0.

== External Dependencies ==
 * required by the core code base: glog, gflags, google protobuf, open-blas, mpich, armci-mpi.
 * required by data preparation and preprocessing: opencv, hdfs, python.

== Cryptography ==
Not Applicable

== Required Resources ==
=== Mailing Lists ===
Currently, we use google group for internal discussion. The mailing address is
nusinga@googlegroup.com. We will migrate the content to the apache mailing
lists in the future.

 * singa-dev
 * singa-user
 * singa-commits
 * singa-private (for private discussion within PCM)

=== Git Repository ===
We want to continue using git for version control. Hence, a git repo is required.

=== Issue Tracking ===
JIRA Singa (SINGA)

== Initial Committers ==
 * Beng Chin Ooi (ooibc@comp.nus.edu.sg)
 * Kian Lee Tan (tankl@comp.nus.edu.sg)
 * Gang Chen (cg@zju.edu.cn)
 * Wei Wang (wangwei@comp.nus.edu.sg)
 * Dinh Tien Tuan Anh (dinhtta@comp.nus.edu.sg)
 * Jinyang Gao (jinyang.gao@comp.nus.edu.sg)
 * Sheng Wang (wangsh@comp.nus.edu.sg)
 * Kaiping Zheng (kaiping@comp.nus.edu.sg)
 * Zhaojing Luo (zhaojing@comp.nus.edu.sg)
 * Zhongle Xie (zhongle@comp.nus.edu.sg)

== Affiliations ==
 * Beng Chin Ooi, National University of Singapore
 * Kian Lee Tan, National University of Singapore
 * Gang Chen, Zhejiang University
 * Wei Wang, National University of Singapore
 * Dinh Tien Tuan Anh, National University of Singapore
 * Jinyang Gao, National University of Singapore
 * Sheng Wang, National University of Singapore
 * Kaiping Zheng, National University of Singapore
 * Zhaojing Luo, National University of Singapore
 * Zhongle Xie, National University of Singapore

== Sponsors ==
===  Champion ===
Thejas Nair (thejas at apache.org) - Hortonworks

=== Nominated Mentors ===
 * Thejas Nair (thejas at apache.org) - Hortonworks
 * We need more volunteers!

=== Sponsoring Entity ===
We are requesting the Incubator to sponsor this project. 

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message