incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From oo...@comp.nus.edu.sg
Subject Re: [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]
Date Sat, 07 Feb 2015 01:21:19 GMT

Regarding the number of users using this project -- at this moment, the
community is not big.  A few local start-ups have been trying to use it
(mainly due to announcement in our seminar list), eg. one is using it for
image recognition (given a phone snapped by a user, it wants to be return
the same the product, and a list of similar products, such as a luxury bag
on a passerby).  Researchers from outside of NUS may have been using it
since we published an application paper on cross domain/modal retrieval in
VLDB 2014.

We have not announced the project to the outside community yet -- we would
announce it in dbworld etc in due course.

Thanks and have a good weekend.

regards
beng chin

>
> Thanks for the comments and suggestions.
> With permission from Thejas, I would like to respond to point 2.
>
> We have a huge team down at NUS (National University of Singapore) --
> we have about seven database/data mining data professors (not including
> those in systems, networking, and machine learning).
> I myself have nine PhD students in a steady state, and I have a few large
> grants, with a total budget of about 15 million S$ (~12 million USD), that
> allows me to hire a number of research fellows and research assistants for
> the next few years.  In a constant state, I have about 20 people (PhD
> students/RA/RF) working with me alone.  Other professors have their own
> grants (unlike other countries, it is relatively easy to get large grants
> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
> have research labs funded by Singapore Research Foundation [equivalent of
> NSF]).
>
> SINGA is a long term project for us -- while it is a platform as it is, we
> are using it for healthcare predictive analytics (by working with a
> hospital associated with the University).  Therefore, we will be working
> on SINGA, not solely as a distributed DL platform, but as a tool that will
> enable us to do data analytics on some business domains (eg. healthcase,
> consumer etc)
>
> For the initial set of committers, three are tenured professors, five are
> students, with 2-5 years to go before they complete their PhD.  Quite
> often, some would stay back as a research fellow for a couple of years
> before they start looking for a job outside.  We will work with mentors
> and new developers (from outside of NUS or Zhejiang University) in
> enhancing the system.
>
> The project should survive in that sense.
>
> (I have an on-going project CIIDAA that has been around since 2008; it was
> started as another project, epiC,  with a different grant, and then we
> continue the development with a new grant for CIIDAA --
> http://www.comp.nus.edu.sg/~ciidaa/
> )
>
> Thanks.
>
> regards
> beng chin
> ps: i am not sure if my email will get through to the group.
>
>
> ---------------------------- Original Message ----------------------------
> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
> From:    "Henry Saputra" <henry.saputra@gmail.com>
> Date:    Thu, February 5, 2015 2:57 pm
> To:      "general@incubator.apache.org" <general@incubator.apache.org>
> Cc:      ooibc@comp.nus.edu.sg
> --------------------------------------------------------------------------
>
> Several comments:
> -) How many users already using this project? I would reccomend to
> drop request for singa-user list at the beginning.
> -) All the initial committers come from university and seemed like
> some of them already ready to leave university. I am not too sure if
> this project go survive if all of the inital committers are from
> university as students.
> -) Need to solicit more mentors if this project ever get to Apache
> incubator.
>
> - Henry
>
> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.nair@gmail.com> wrote:
>> The "Relationship with Other Apache Products" section has been
>> updated. The reference to H2O in that section has been removed, and
>> other projects have been added.
>>  Thanks for the feedback!
>>
>>
>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <thejas.nair@gmail.com>
> wrote:
>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
>>> apache project, I should have verified that.
>>> I will edit that, and revisit that section along with the folks in
>>> Singa community.
>>>
>>>
>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
> <henry.saputra@gmail.com> wrote:
>>>> Quick immediate comment that "Apache H2O" is not really Apache
>>>> project.
>>>>
>>>> I assume you are referring to https://github.com/h2oai/h2o (or
>>>> https://github.com/h2oai/h2o-dev) ?
>>>>
>>>> - Henry
>>>>
>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <thejas.nair@gmail.com>
> wrote:
>>>>> Hello everyone,
>>>>>
>>>>> I would like to propose the inclusion of Singa as an Apache Incubator
> project.
>>>>>
>>>>> Here is the proposal -
>>>>> https://wiki.apache.org/incubator/SingaProposal
>>>>>
>>>>> Please review the proposal and give feedback. I am planning to start
>>>>> a
>>>>> vote after 7 days if the proposal looks good.
>>>>> We are also seeking additional Apache mentors for the project.
>>>>>
>>>>> Thanks,
>>>>> Thejas
>>>>> ==========================================================
>>>>> Singa Incubator Proposal
>>>>>
>>>>> Abstract
>>>>>
>>>>> SINGA is a distributed deep learning platform.
>>>>>
>>>>> Proposal
>>>>>
>>>>> SINGA is an efficient, scalable and easy-to-use distributed platform
>>>>> for training deep learning models, e.g., Deep Convolutional Neural
>>>>> Network and Deep Belief Network. It parallelizes the computation
>>>>> (i.e., training) onto a cluster of nodes by distributing the training
>>>>> data and model automatically to speed up the training. Built-in
>>>>> training algorithms like Back-Propagation and Contrastive Divergence
>>>>> are implemented based on common abstractions of deep learning models.
>>>>> Users can train their own deep learning models by simply customizing
>>>>> these abstractions like implementing the Mapper and Reducer in
>>>>> Hadoop.
>>>>>
>>>>> Background
>>>>>
>>>>> Deep learning refers to a set of feature (or representation) learning
>>>>> models that consist of multiple (non-linear) layers, where different
>>>>> layers learn different levels of abstractions (representations) of
>>>>> the
>>>>> raw input data. Larger (in terms of model parameters) and deeper (in
>>>>> terms of number of layers) models have shown better performance,
>>>>> e.g.,
>>>>> lower image classification error in Large Scale Visual Recognition
>>>>> Challenge. However, a larger model requires more memory and larger
>>>>> training data to reduce over-fitting. Complex numeric operations make
>>>>> the training computation intensive. In practice, training large deep
>>>>> learning models takes weeks or months on a single node (even with
>>>>> GPU).
>>>>>
>>>>> Rational
>>>>>
>>>>> Deep learning has gained a lot of attraction in both academia and
>>>>> industry due to its success in a wide range of areas such as computer
>>>>> vision and speech recognition. However, training of such models is
>>>>> computationally expensive, especially for large and deep models
>>>>> (e.g.,
>>>>> with billions of parameters and more than 10 layers). Both Google and
>>>>> Microsoft have developed distributed deep learning systems to make
>>>>> the
>>>>> training more efficient by distributing the computations within a
>>>>> cluster of nodes. However, these systems are closed source softwares.
>>>>> Our goal is to leverage the community of open source developers to
>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
>>>>> fledged distributed platform, that could benefit the community and
>>>>> also benefit from the community in their involvement in contributing
>>>>> to the further work in this area. We believe the nature of SINGA and
>>>>> our visions for the system fit naturally to Apache's philosophy and
>>>>> development framework.
>>>>>
>>>>> Initial Goals
>>>>>
>>>>> We have developed a system for SINGA running on a commodity computer
>>>>> cluster. The initial goals include, * improving the system in terms
>>>>> of
>>>>> scalability and efficiency, e.g., using Infiniband for network
>>>>> communication and multi-threading for one node computation. We would
>>>>> consider extending SINGA to GPU clusters later. * benchmarking with
>>>>> larger datasets (hundreds of millions of training instances) and
>>>>> models (billions of parameters). * adding more built-in deep learning
>>>>> models. Users can train the built-in models on their datasets
>>>>> directly.
>>>>>
>>>>> Current Status
>>>>>
>>>>> Meritocracy
>>>>>
>>>>> We would like to follow ASF meritocratic principles to encourage more
>>>>> developers to contribute in this project. We know that only active
>>>>> and
>>>>> excellent developers can make SINGA a successful project. The
>>>>> committer list and PMC will be updated based on developers'
>>>>> performance and commitment. We are also improving the documentation
>>>>> and code to help new developers get started quickly.
>>>>>
>>>>> Community
>>>>>
>>>>> SINGA is currently being developed in the Database System Research
>>>>> Lab
>>>>> at the National University of Singapore (NUS) in collaboration with
>>>>> Zhejiang University in China. Our lab has extensive experience in
>>>>> building database related systems, including distributed systems. Six
>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen,
>>>>> Kian
>>>>> Lee Tan) have been working for a year on this project. We are open to
>>>>> recruiting more developers from diverse backgrounds.
>>>>>
>>>>> Core Developers
>>>>>
>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have
>>>>> worked on distributed systems for more than 20 years. They have
>>>>> collaborated with the industry and have built various large scale
>>>>> systems. Anh Dinh's research is also on distributed systems, albeit
>>>>> with more focus on security aspects. Wei Wang's research is on deep
>>>>> learning problems including deep learning applications and large
>>>>> scale
>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
>>>>> querying of large scale data and machine learning. Kaiping, Zhaojing
>>>>> and Zhongle are new PhD students who jointed SINGA recently. They
>>>>> will
>>>>> work on this project for a longer time (next 4-5 years). While we
>>>>> share common research interests, each member also brings diverse
>>>>> expertise to the team.
>>>>>
>>>>> Alignment
>>>>>
>>>>> ASF is already the home of many distributed platforms, e.g., Hadoop,
>>>>> Spark and Mahout, each of which targets a different application
>>>>> domain. SINGA, being a distributed platform for large-scale deep
>>>>> learning, focuses on another important domain for which there still
>>>>> lacks a robust and scalable open-source platform. The recent success
>>>>> of deep learning models especially for vision and speech recognition
>>>>> tasks has generated interests in both applying existing deep learning
>>>>> models and in developing new ones. Thus, an open-source platform for
>>>>> deep learning will be able to attract a large community of users and
>>>>> developers. SINGA is a complex system needing many iterations of
>>>>> design, implementation and testing. Apache's collaboration framework
>>>>> which encourages active contribution from developers will inevitably
>>>>> help improve the quality of the system, as shown in the success of
>>>>> Hadoop, Spark, etc.. Equally important is the community of users
>>>>> which
>>>>> helps identify real-life applications of deep learning, and helps to
>>>>> evaluate the system's performance and ease-of-use. We hope to
>>>>> leverage
>>>>> ASF for coordinating and promoting both communities, and in return
>>>>> benefit the communities with another useful tool.
>>>>>
>>>>> Known Risks
>>>>>
>>>>> Orphaned products
>>>>>
>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may
>>>>> leave
>>>>> the lab in two to four years time. It is possible that some of them
>>>>> may not have enough time to focus on this project after that. But,
>>>>> SINGA is part of our other bigger research projects on building an
>>>>> infrastructure for data intensive applications, which include
>>>>> health-care analytics and brain-inspired computing. Beng Chin and
>>>>> Kian
>>>>> Lee would continue working on it and getting more people involved.
>>>>> For
>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined
>>>>> us recently. Individual developers are welcome to make SINGA a
>>>>> diverse
>>>>> community that is robust and independent from any single developer.
>>>>>
>>>>> Inexperience with Open Source
>>>>>
>>>>> All the developers are active users and followers of open source
>>>>> projects. Our research lab has a strong commitment to open source,
>>>>> and
>>>>> has released the source code of several systems under open source
>>>>> license as a way of contributing back to the open source community.
>>>>> But we do not have much real experience in open source projects with
>>>>> large and well organized communities like those in Apache. This is
>>>>> one
>>>>> reason we choose Apache which is experienced in open source project
>>>>> incubation. We hope to get the help from Apache (e.g., champion and
>>>>> mentors) to establish a healthy path for SINGA.
>>>>>
>>>>> Homogenous Developers
>>>>>
>>>>> Although the current developers are researchers in the universities,
>>>>> they have different research interests and project experiences, as
>>>>> mentioned in the section that introduces the core developers. We know
>>>>> that a diverse community is helpful. Hence we are open to the idea of
>>>>> recruiting developers from other regions and organizations.
>>>>>
>>>>> Reliance on Salaried Developers
>>>>>
>>>>> As a research project in the university, SINGA's current developing
>>>>> community consists of professors, PhD students, research assistants
>>>>> and postdoctoral fellows. They are driven by their interests to work
>>>>> on this project and have contributed actively since the start of the
>>>>> project. The research assistants and fellows are expected to leave
>>>>> when their contracts expire. However, they are keen to continue to
>>>>> work on the project voluntarily. Moreover, as a long term research
>>>>> project, new research assistants and fellows are likely to join the
>>>>> project.
>>>>>
>>>>> A Excessive Fascination with the Apache Brand
>>>>>
>>>>> We choose Apache not for publicity. We have two purposes. First, we
>>>>> want to leverage Apache's reputation to recruit more developers to
>>>>> make a diverse community. Second, we hope that Apache can help us to
>>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee
>>>>> are established database and distributed system researchers, and
>>>>> together with the other contributors, they sincerely believe that
>>>>> there is a need for a widely accepted open source distributed deep
>>>>> learning platform. The field of deep learning is still at its
>>>>> infancy,
>>>>> and an open source platform will fuel the research in the area.
>>>>> Moreover, such a platform will enable researchers to develop new
>>>>> models and algorithms, rather than spending time implementing a deep
>>>>> learning system from scratch. Furthermore, the need for scalability
>>>>> for such a platform is obvious.
>>>>>
>>>>> Relationship with Other Apache Products
>>>>>
>>>>> Apache H2O implemented two simple deep learning models, namely the
>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
>>>>> significant differences between H2O and SINGA. First, H2O adopts the
>>>>> Map-Reduce framework which runs a set of computing nodes in parallel
>>>>> againsts of the training set. Model parameters trained by all
>>>>> computing nodes are averaged as the final model parameters. This
>>>>> training algorithm is different from the distributed training
>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
>>>>> synchronizes the parameters trained from different nodes. SINGA
>>>>> adopts
>>>>> the parameter server framework to support a wide range of distributed
>>>>> training algorithms and parallelization methods (e.g., data
>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
>>>>> support data parallelism) . Second, in H2O, users are restricted to
>>>>> use the two built-in models. In SINGA, we provide simple programming
>>>>> model to let users implement their own deep learning models. A new
>>>>> deep learning model can be implemented by customizing the base Layer
>>>>> class for each layer involved in the model. It is similar to writing
>>>>> Hadoop programs where users only need to override the base Mapper and
>>>>> Reducer. We also provide built-in models for users to use directly.
>>>>>
>>>>> Documentation
>>>>>
>>>>> The project is hosted at
>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
>>>>> Documentations can be found at the Github Wiki Page:
>>>>> https://github.com/nusinga/singa/wiki. We continue to refine and
>>>>> improve the documentation.
>>>>>
>>>>> Initial Source
>>>>>
>>>>> We use Github to maintain our source code,
> https://github.com/nusinga/singa
>>>>>
>>>>> Source and Intellectual Property Submission Plan
>>>>>
>>>>> We plan to make our code base be under Apache License, Version 2.0.
>>>>>
>>>>> External Dependencies
>>>>>
>>>>> required by the core code base: glog, gflags, google protobuf,
>>>>> open-blas, mpich, armci-mpi.
>>>>> required by data preparation and preprocessing: opencv, hdfs, python.
>>>>>
>>>>> Cryptography
>>>>>
>>>>> Not Applicable
>>>>>
>>>>> Required Resources
>>>>>
>>>>> Mailing Lists
>>>>>
>>>>> Currently, we use google group for internal discussion. The mailing
>>>>> address is nusinga@googlegroup.com. We will migrate the content to
>>>>> the
>>>>> apache mailing lists in the future.
>>>>>
>>>>> singa-dev
>>>>> singa-user
>>>>> singa-commits
>>>>> singa-private (for private discussion within PCM)
>>>>>
>>>>> Git Repository
>>>>>
>>>>> We want to continue using git for version control. Hence, a git repo
>>>>> is required.
>>>>>
>>>>> Issue Tracking
>>>>>
>>>>> JIRA Singa (SINGA)
>>>>>
>>>>> Initial Committers
>>>>>
>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
>>>>> Gang Chen (cg @zju.edu.cn)
>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
>>>>>
>>>>> Affiliations
>>>>>
>>>>> Beng Chin Ooi, National University of Singapore
>>>>> Kian Lee Tan, National University of Singapore
>>>>> Gang Chen, Zhejiang University
>>>>> Wei Wang, National University of Singapore
>>>>> Dinh Tien Tuan Anh, National University of Singapore
>>>>> Jinyang Gao, National University of Singapore
>>>>> Sheng Wang, National University of Singapore
>>>>> Kaiping Zheng, National University of Singapore
>>>>> Zhaojing Luo, National University of Singapore
>>>>> Zhongle Xie, National University of Singapore
>>>>>
>>>>> Sponsors
>>>>>
>>>>> Champion
>>>>>
>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>>
>>>>> Nominated Mentors
>>>>>
>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
>>>>> Alan Gates (gates at apache dot org) - Hortonworks
>>>>> (Seeking more volunteers!)
>>>>>
>>>>> Sponsoring Entity
>>>>>
>>>>> We are requesting the Incubator to sponsor this project.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message