incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From oo...@comp.nus.edu.sg
Subject [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]
Date Fri, 06 Feb 2015 03:55:24 GMT

Thanks for the comments and suggestions.
With permission from Thejas, I would like to respond to point 2.

We have a huge team down at NUS (National University of Singapore) --
we have about seven database/data mining data professors (not including
those in systems, networking, and machine learning).
I myself have nine PhD students in a steady state, and I have a few large
grants, with a total budget of about 15 million S$ (~12 million USD), that
allows me to hire a number of research fellows and research assistants for
the next few years.  In a constant state, I have about 20 people (PhD
students/RA/RF) working with me alone.  Other professors have their own
grants (unlike other countries, it is relatively easy to get large grants
in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
have research labs funded by Singapore Research Foundation [equivalent of
NSF]).

SINGA is a long term project for us -- while it is a platform as it is, we
are using it for healthcare predictive analytics (by working with a
hospital associated with the University).  Therefore, we will be working
on SINGA, not solely as a distributed DL platform, but as a tool that will
enable us to do data analytics on some business domains (eg. healthcase,
consumer etc)

For the initial set of committers, three are tenured professors, five are
students, with 2-5 years to go before they complete their PhD.  Quite
often, some would stay back as a research fellow for a couple of years
before they start looking for a job outside.  We will work with mentors
and new developers (from outside of NUS or Zhejiang University) in
enhancing the system.

The project should survive in that sense.

(I have an on-going project CIIDAA that has been around since 2008; it was
started as another project, epiC,  with a different grant, and then we
continue the development with a new grant for CIIDAA --
http://www.comp.nus.edu.sg/~ciidaa/
)

Thanks.

regards
beng chin
ps: i am not sure if my email will get through to the group.


---------------------------- Original Message ----------------------------
Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
From:    "Henry Saputra" <henry.saputra@gmail.com>
Date:    Thu, February 5, 2015 2:57 pm
To:      "general@incubator.apache.org" <general@incubator.apache.org>
Cc:      ooibc@comp.nus.edu.sg
--------------------------------------------------------------------------

Several comments:
-) How many users already using this project? I would reccomend to
drop request for singa-user list at the beginning.
-) All the initial committers come from university and seemed like
some of them already ready to leave university. I am not too sure if
this project go survive if all of the inital committers are from
university as students.
-) Need to solicit more mentors if this project ever get to Apache incubator.

- Henry

On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.nair@gmail.com> wrote:
> The "Relationship with Other Apache Products" section has been
> updated. The reference to H2O in that section has been removed, and
> other projects have been added.
>  Thanks for the feedback!
>
>
> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <thejas.nair@gmail.com>
wrote:
>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>> apache project, I should have verified that.
>> I will edit that, and revisit that section along with the folks in
>> Singa community.
>>
>>
>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
<henry.saputra@gmail.com> wrote:
>>> Quick immediate comment that "Apache H2O" is not really Apache project.
>>>
>>> I assume you are referring to https://github.com/h2oai/h2o (or
>>> https://github.com/h2oai/h2o-dev) ?
>>>
>>> - Henry
>>>
>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <thejas.nair@gmail.com>
wrote:
>>>> Hello everyone,
>>>>
>>>> I would like to propose the inclusion of Singa as an Apache Incubator
project.
>>>>
>>>> Here is the proposal - https://wiki.apache.org/incubator/SingaProposal
>>>>
>>>> Please review the proposal and give feedback. I am planning to start a
>>>> vote after 7 days if the proposal looks good.
>>>> We are also seeking additional Apache mentors for the project.
>>>>
>>>> Thanks,
>>>> Thejas
>>>> ==========================================================
>>>> Singa Incubator Proposal
>>>>
>>>> Abstract
>>>>
>>>> SINGA is a distributed deep learning platform.
>>>>
>>>> Proposal
>>>>
>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>>> Network and Deep Belief Network. It parallelizes the computation
>>>> (i.e., training) onto a cluster of nodes by distributing the training
>>>> data and model automatically to speed up the training. Built-in
>>>> training algorithms like Back-Propagation and Contrastive Divergence
>>>> are implemented based on common abstractions of deep learning models.
>>>> Users can train their own deep learning models by simply customizing
>>>> these abstractions like implementing the Mapper and Reducer in Hadoop.
>>>>
>>>> Background
>>>>
>>>> Deep learning refers to a set of feature (or representation) learning
>>>> models that consist of multiple (non-linear) layers, where different
>>>> layers learn different levels of abstractions (representations) of the
>>>> raw input data. Larger (in terms of model parameters) and deeper (in
>>>> terms of number of layers) models have shown better performance, e.g.,
>>>> lower image classification error in Large Scale Visual Recognition
>>>> Challenge. However, a larger model requires more memory and larger
>>>> training data to reduce over-fitting. Complex numeric operations make
>>>> the training computation intensive. In practice, training large deep
>>>> learning models takes weeks or months on a single node (even with
>>>> GPU).
>>>>
>>>> Rational
>>>>
>>>> Deep learning has gained a lot of attraction in both academia and
>>>> industry due to its success in a wide range of areas such as computer
>>>> vision and speech recognition. However, training of such models is
>>>> computationally expensive, especially for large and deep models (e.g.,
>>>> with billions of parameters and more than 10 layers). Both Google and
>>>> Microsoft have developed distributed deep learning systems to make the
>>>> training more efficient by distributing the computations within a
>>>> cluster of nodes. However, these systems are closed source softwares.
>>>> Our goal is to leverage the community of open source developers to
>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>>> fledged distributed platform, that could benefit the community and
>>>> also benefit from the community in their involvement in contributing
>>>> to the further work in this area. We believe the nature of SINGA and
>>>> our visions for the system fit naturally to Apache's philosophy and
>>>> development framework.
>>>>
>>>> Initial Goals
>>>>
>>>> We have developed a system for SINGA running on a commodity computer
>>>> cluster. The initial goals include, * improving the system in terms of
>>>> scalability and efficiency, e.g., using Infiniband for network
>>>> communication and multi-threading for one node computation. We would
>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>>> larger datasets (hundreds of millions of training instances) and
>>>> models (billions of parameters). * adding more built-in deep learning
>>>> models. Users can train the built-in models on their datasets
>>>> directly.
>>>>
>>>> Current Status
>>>>
>>>> Meritocracy
>>>>
>>>> We would like to follow ASF meritocratic principles to encourage more
>>>> developers to contribute in this project. We know that only active and
>>>> excellent developers can make SINGA a successful project. The
>>>> committer list and PMC will be updated based on developers'
>>>> performance and commitment. We are also improving the documentation
>>>> and code to help new developers get started quickly.
>>>>
>>>> Community
>>>>
>>>> SINGA is currently being developed in the Database System Research Lab
>>>> at the National University of Singapore (NUS) in collaboration with
>>>> Zhejiang University in China. Our lab has extensive experience in
>>>> building database related systems, including distributed systems. Six
>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
>>>> Lee Tan) have been working for a year on this project. We are open to
>>>> recruiting more developers from diverse backgrounds.
>>>>
>>>> Core Developers
>>>>
>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>>> worked on distributed systems for more than 20 years. They have
>>>> collaborated with the industry and have built various large scale
>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>>> with more focus on security aspects. Wei Wang's research is on deep
>>>> learning problems including deep learning applications and large scale
>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>>> and Zhongle are new PhD students who jointed SINGA recently. They will
>>>> work on this project for a longer time (next 4-5 years). While we
>>>> share common research interests, each member also brings diverse
>>>> expertise to the team.
>>>>
>>>> Alignment
>>>>
>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>>>> Spark and Mahout, each of which targets a different application
>>>> domain. SINGA, being a distributed platform for large-scale deep
>>>> learning, focuses on another important domain for which there still
>>>> lacks a robust and scalable open-source platform. The recent success
>>>> of deep learning models especially for vision and speech recognition
>>>> tasks has generated interests in both applying existing deep learning
>>>> models and in developing new ones. Thus, an open-source platform for
>>>> deep learning will be able to attract a large community of users and
>>>> developers. SINGA is a complex system needing many iterations of
>>>> design, implementation and testing. Apache's collaboration framework
>>>> which encourages active contribution from developers will inevitably
>>>> help improve the quality of the system, as shown in the success of
>>>> Hadoop, Spark, etc.. Equally important is the community of users which
>>>> helps identify real-life applications of deep learning, and helps to
>>>> evaluate the system's performance and ease-of-use. We hope to leverage
>>>> ASF for coordinating and promoting both communities, and in return
>>>> benefit the communities with another useful tool.
>>>>
>>>> Known Risks
>>>>
>>>> Orphaned products
>>>>
>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
>>>> the lab in two to four years time. It is possible that some of them
>>>> may not have enough time to focus on this project after that. But,
>>>> SINGA is part of our other bigger research projects on building an
>>>> infrastructure for data intensive applications, which include
>>>> health-care analytics and brain-inspired computing. Beng Chin and Kian
>>>> Lee would continue working on it and getting more people involved. For
>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>>>> us recently. Individual developers are welcome to make SINGA a diverse
>>>> community that is robust and independent from any single developer.
>>>>
>>>> Inexperience with Open Source
>>>>
>>>> All the developers are active users and followers of open source
>>>> projects. Our research lab has a strong commitment to open source, and
>>>> has released the source code of several systems under open source
>>>> license as a way of contributing back to the open source community.
>>>> But we do not have much real experience in open source projects with
>>>> large and well organized communities like those in Apache. This is one
>>>> reason we choose Apache which is experienced in open source project
>>>> incubation. We hope to get the help from Apache (e.g., champion and
>>>> mentors) to establish a healthy path for SINGA.
>>>>
>>>> Homogenous Developers
>>>>
>>>> Although the current developers are researchers in the universities,
>>>> they have different research interests and project experiences, as
>>>> mentioned in the section that introduces the core developers. We know
>>>> that a diverse community is helpful. Hence we are open to the idea of
>>>> recruiting developers from other regions and organizations.
>>>>
>>>> Reliance on Salaried Developers
>>>>
>>>> As a research project in the university, SINGA's current developing
>>>> community consists of professors, PhD students, research assistants
>>>> and postdoctoral fellows. They are driven by their interests to work
>>>> on this project and have contributed actively since the start of the
>>>> project. The research assistants and fellows are expected to leave
>>>> when their contracts expire. However, they are keen to continue to
>>>> work on the project voluntarily. Moreover, as a long term research
>>>> project, new research assistants and fellows are likely to join the
>>>> project.
>>>>
>>>> A Excessive Fascination with the Apache Brand
>>>>
>>>> We choose Apache not for publicity. We have two purposes. First, we
>>>> want to leverage Apache's reputation to recruit more developers to
>>>> make a diverse community. Second, we hope that Apache can help us to
>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>>> are established database and distributed system researchers, and
>>>> together with the other contributors, they sincerely believe that
>>>> there is a need for a widely accepted open source distributed deep
>>>> learning platform. The field of deep learning is still at its infancy,
>>>> and an open source platform will fuel the research in the area.
>>>> Moreover, such a platform will enable researchers to develop new
>>>> models and algorithms, rather than spending time implementing a deep
>>>> learning system from scratch. Furthermore, the need for scalability
>>>> for such a platform is obvious.
>>>>
>>>> Relationship with Other Apache Products
>>>>
>>>> Apache H2O implemented two simple deep learning models, namely the
>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>>> significant differences between H2O and SINGA. First, H2O adopts the
>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>>> againsts of the training set. Model parameters trained by all
>>>> computing nodes are averaged as the final model parameters. This
>>>> training algorithm is different from the distributed training
>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>>> synchronizes the parameters trained from different nodes. SINGA adopts
>>>> the parameter server framework to support a wide range of distributed
>>>> training algorithms and parallelization methods (e.g., data
>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>>> support data parallelism) . Second, in H2O, users are restricted to
>>>> use the two built-in models. In SINGA, we provide simple programming
>>>> model to let users implement their own deep learning models. A new
>>>> deep learning model can be implemented by customizing the base Layer
>>>> class for each layer involved in the model. It is similar to writing
>>>> Hadoop programs where users only need to override the base Mapper and
>>>> Reducer. We also provide built-in models for users to use directly.
>>>>
>>>> Documentation
>>>>
>>>> The project is hosted at
>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>>> Documentations can be found at the Github Wiki Page:
>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>>>> improve the documentation.
>>>>
>>>> Initial Source
>>>>
>>>> We use Github to maintain our source code,
https://github.com/nusinga/singa
>>>>
>>>> Source and Intellectual Property Submission Plan
>>>>
>>>> We plan to make our code base be under Apache License, Version 2.0.
>>>>
>>>> External Dependencies
>>>>
>>>> required by the core code base: glog, gflags, google protobuf,
>>>> open-blas, mpich, armci-mpi.
>>>> required by data preparation and preprocessing: opencv, hdfs, python.
>>>>
>>>> Cryptography
>>>>
>>>> Not Applicable
>>>>
>>>> Required Resources
>>>>
>>>> Mailing Lists
>>>>
>>>> Currently, we use google group for internal discussion. The mailing
>>>> address is nusinga@googlegroup.com. We will migrate the content to the
>>>> apache mailing lists in the future.
>>>>
>>>> singa-dev
>>>> singa-user
>>>> singa-commits
>>>> singa-private (for private discussion within PCM)
>>>>
>>>> Git Repository
>>>>
>>>> We want to continue using git for version control. Hence, a git repo
>>>> is required.
>>>>
>>>> Issue Tracking
>>>>
>>>> JIRA Singa (SINGA)
>>>>
>>>> Initial Committers
>>>>
>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>>> Gang Chen (cg @zju.edu.cn)
>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>>>
>>>> Affiliations
>>>>
>>>> Beng Chin Ooi, National University of Singapore
>>>> Kian Lee Tan, National University of Singapore
>>>> Gang Chen, Zhejiang University
>>>> Wei Wang, National University of Singapore
>>>> Dinh Tien Tuan Anh, National University of Singapore
>>>> Jinyang Gao, National University of Singapore
>>>> Sheng Wang, National University of Singapore
>>>> Kaiping Zheng, National University of Singapore
>>>> Zhaojing Luo, National University of Singapore
>>>> Zhongle Xie, National University of Singapore
>>>>
>>>> Sponsors
>>>>
>>>> Champion
>>>>
>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>
>>>> Nominated Mentors
>>>>
>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>>> (Seeking more volunteers!)
>>>>
>>>> Sponsoring Entity
>>>>
>>>> We are requesting the Incubator to sponsor this project.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message