incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thejas Nair <thejas.n...@gmail.com>
Subject [DISCUSS] [PROPOSAL] Singa for Apache Incubator
Date Wed, 28 Jan 2015 01:29:51 GMT
Hello everyone,

I would like to propose the inclusion of Singa as an Apache Incubator project.

Here is the proposal - https://wiki.apache.org/incubator/SingaProposal

Please review the proposal and give feedback. I am planning to start a
vote after 7 days if the proposal looks good.
We are also seeking additional Apache mentors for the project.

Thanks,
Thejas
==========================================================
Singa Incubator Proposal

Abstract

SINGA is a distributed deep learning platform.

Proposal

SINGA is an efficient, scalable and easy-to-use distributed platform
for training deep learning models, e.g., Deep Convolutional Neural
Network and Deep Belief Network. It parallelizes the computation
(i.e., training) onto a cluster of nodes by distributing the training
data and model automatically to speed up the training. Built-in
training algorithms like Back-Propagation and Contrastive Divergence
are implemented based on common abstractions of deep learning models.
Users can train their own deep learning models by simply customizing
these abstractions like implementing the Mapper and Reducer in Hadoop.

Background

Deep learning refers to a set of feature (or representation) learning
models that consist of multiple (non-linear) layers, where different
layers learn different levels of abstractions (representations) of the
raw input data. Larger (in terms of model parameters) and deeper (in
terms of number of layers) models have shown better performance, e.g.,
lower image classification error in Large Scale Visual Recognition
Challenge. However, a larger model requires more memory and larger
training data to reduce over-fitting. Complex numeric operations make
the training computation intensive. In practice, training large deep
learning models takes weeks or months on a single node (even with
GPU).

Rational

Deep learning has gained a lot of attraction in both academia and
industry due to its success in a wide range of areas such as computer
vision and speech recognition. However, training of such models is
computationally expensive, especially for large and deep models (e.g.,
with billions of parameters and more than 10 layers). Both Google and
Microsoft have developed distributed deep learning systems to make the
training more efficient by distributing the computations within a
cluster of nodes. However, these systems are closed source softwares.
Our goal is to leverage the community of open source developers to
make SINGA efficient, scalable and easy to use. SINGA is a full
fledged distributed platform, that could benefit the community and
also benefit from the community in their involvement in contributing
to the further work in this area. We believe the nature of SINGA and
our visions for the system fit naturally to Apache's philosophy and
development framework.

Initial Goals

We have developed a system for SINGA running on a commodity computer
cluster. The initial goals include, * improving the system in terms of
scalability and efficiency, e.g., using Infiniband for network
communication and multi-threading for one node computation. We would
consider extending SINGA to GPU clusters later. * benchmarking with
larger datasets (hundreds of millions of training instances) and
models (billions of parameters). * adding more built-in deep learning
models. Users can train the built-in models on their datasets
directly.

Current Status

Meritocracy

We would like to follow ASF meritocratic principles to encourage more
developers to contribute in this project. We know that only active and
excellent developers can make SINGA a successful project. The
committer list and PMC will be updated based on developers'
performance and commitment. We are also improving the documentation
and code to help new developers get started quickly.

Community

SINGA is currently being developed in the Database System Research Lab
at the National University of Singapore (NUS) in collaboration with
Zhejiang University in China. Our lab has extensive experience in
building database related systems, including distributed systems. Six
PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
Lee Tan) have been working for a year on this project. We are open to
recruiting more developers from diverse backgrounds.

Core Developers

Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
worked on distributed systems for more than 20 years. They have
collaborated with the industry and have built various large scale
systems. Anh Dinh's research is also on distributed systems, albeit
with more focus on security aspects. Wei Wang's research is on deep
learning problems including deep learning applications and large scale
training. Sheng Wang and Jinyang are working on efficient indexing,
querying of large scale data and machine learning. Kaiping, Zhaojing
and Zhongle are new PhD students who jointed SINGA recently. They will
work on this project for a longer time (next 4-5 years). While we
share common research interests, each member also brings diverse
expertise to the team.

Alignment

ASF is already the home of many distributed platforms, e.g., Hadoop,
Spark and Mahout, each of which targets a different application
domain. SINGA, being a distributed platform for large-scale deep
learning, focuses on another important domain for which there still
lacks a robust and scalable open-source platform. The recent success
of deep learning models especially for vision and speech recognition
tasks has generated interests in both applying existing deep learning
models and in developing new ones. Thus, an open-source platform for
deep learning will be able to attract a large community of users and
developers. SINGA is a complex system needing many iterations of
design, implementation and testing. Apache's collaboration framework
which encourages active contribution from developers will inevitably
help improve the quality of the system, as shown in the success of
Hadoop, Spark, etc.. Equally important is the community of users which
helps identify real-life applications of deep learning, and helps to
evaluate the system's performance and ease-of-use. We hope to leverage
ASF for coordinating and promoting both communities, and in return
benefit the communities with another useful tool.

Known Risks

Orphaned products

Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
the lab in two to four years time. It is possible that some of them
may not have enough time to focus on this project after that. But,
SINGA is part of our other bigger research projects on building an
infrastructure for data intensive applications, which include
health-care analytics and brain-inspired computing. Beng Chin and Kian
Lee would continue working on it and getting more people involved. For
example, three new developers (Kaiping, Zhaojing and Zhongle) joined
us recently. Individual developers are welcome to make SINGA a diverse
community that is robust and independent from any single developer.

Inexperience with Open Source

All the developers are active users and followers of open source
projects. Our research lab has a strong commitment to open source, and
has released the source code of several systems under open source
license as a way of contributing back to the open source community.
But we do not have much real experience in open source projects with
large and well organized communities like those in Apache. This is one
reason we choose Apache which is experienced in open source project
incubation. We hope to get the help from Apache (e.g., champion and
mentors) to establish a healthy path for SINGA.

Homogenous Developers

Although the current developers are researchers in the universities,
they have different research interests and project experiences, as
mentioned in the section that introduces the core developers. We know
that a diverse community is helpful. Hence we are open to the idea of
recruiting developers from other regions and organizations.

Reliance on Salaried Developers

As a research project in the university, SINGA's current developing
community consists of professors, PhD students, research assistants
and postdoctoral fellows. They are driven by their interests to work
on this project and have contributed actively since the start of the
project. The research assistants and fellows are expected to leave
when their contracts expire. However, they are keen to continue to
work on the project voluntarily. Moreover, as a long term research
project, new research assistants and fellows are likely to join the
project.

A Excessive Fascination with the Apache Brand

We choose Apache not for publicity. We have two purposes. First, we
want to leverage Apache's reputation to recruit more developers to
make a diverse community. Second, we hope that Apache can help us to
establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
are established database and distributed system researchers, and
together with the other contributors, they sincerely believe that
there is a need for a widely accepted open source distributed deep
learning platform. The field of deep learning is still at its infancy,
and an open source platform will fuel the research in the area.
Moreover, such a platform will enable researchers to develop new
models and algorithms, rather than spending time implementing a deep
learning system from scratch. Furthermore, the need for scalability
for such a platform is obvious.

Relationship with Other Apache Products

Apache H2O implemented two simple deep learning models, namely the
Multi-Layer Perceptron and Deep Auto-encoders. There are two
significant differences between H2O and SINGA. First, H2O adopts the
Map-Reduce framework which runs a set of computing nodes in parallel
againsts of the training set. Model parameters trained by all
computing nodes are averaged as the final model parameters. This
training algorithm is different from the distributed training
algorithm used by DistBelief, Adam and SINGA, which frequently
synchronizes the parameters trained from different nodes. SINGA adopts
the parameter server framework to support a wide range of distributed
training algorithms and parallelization methods (e.g., data
parallelism, model parallelism and hybrid parallelism. H2O only
support data parallelism) . Second, in H2O, users are restricted to
use the two built-in models. In SINGA, we provide simple programming
model to let users implement their own deep learning models. A new
deep learning model can be implemented by customizing the base Layer
class for each layer involved in the model. It is similar to writing
Hadoop programs where users only need to override the base Mapper and
Reducer. We also provide built-in models for users to use directly.

Documentation

The project is hosted at
http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
Documentations can be found at the Github Wiki Page:
https://github.com/nusinga/singa/wiki. We continue to refine and
improve the documentation.

Initial Source

We use Github to maintain our source code, https://github.com/nusinga/singa

Source and Intellectual Property Submission Plan

We plan to make our code base be under Apache License, Version 2.0.

External Dependencies

required by the core code base: glog, gflags, google protobuf,
open-blas, mpich, armci-mpi.
required by data preparation and preprocessing: opencv, hdfs, python.

Cryptography

Not Applicable

Required Resources

Mailing Lists

Currently, we use google group for internal discussion. The mailing
address is nusinga@googlegroup.com. We will migrate the content to the
apache mailing lists in the future.

singa-dev
singa-user
singa-commits
singa-private (for private discussion within PCM)

Git Repository

We want to continue using git for version control. Hence, a git repo
is required.

Issue Tracking

JIRA Singa (SINGA)

Initial Committers

Beng Chin Ooi (ooibc @comp.nus.edu.sg)
Kian Lee Tan (tankl @comp.nus.edu.sg)
Gang Chen (cg @zju.edu.cn)
Wei Wang (wangwei @comp.nus.edu.sg)
Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
Sheng Wang (wangsh @comp.nus.edu.sg)
Kaiping Zheng (kaiping @comp.nus.edu.sg)
Zhaojing Luo (zhaojing @comp.nus.edu.sg)
Zhongle Xie (zhongle @comp.nus.edu.sg)

Affiliations

Beng Chin Ooi, National University of Singapore
Kian Lee Tan, National University of Singapore
Gang Chen, Zhejiang University
Wei Wang, National University of Singapore
Dinh Tien Tuan Anh, National University of Singapore
Jinyang Gao, National University of Singapore
Sheng Wang, National University of Singapore
Kaiping Zheng, National University of Singapore
Zhaojing Luo, National University of Singapore
Zhongle Xie, National University of Singapore

Sponsors

Champion

Thejas Nair (thejas at apache.org) - Hortonworks

Nominated Mentors

Thejas Nair (thejas at apache.org) - Hortonworks
Alan Gates (gates at apache dot org) - Hortonworks
(Seeking more volunteers!)

Sponsoring Entity

We are requesting the Incubator to sponsor this project.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message