incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: [VOTE] Accept Apache Singa as incubator project
Date Tue, 10 Mar 2015 21:59:29 GMT
+1

The diversity should be closely eximined if by the graduation time the
situation hasn't improved.

Cos

On Tue, Mar 10, 2015 at 12:17PM, Ted Dunning wrote:
> +1
> 
> I am not nearly as worried about the committer diversity, certainly not
> relative to entry into incubator.  This is a great project that has already
> shown some very strong willingness to work with others in the short time I
> have interacted with them.
> 
> 
> On Tue, Mar 10, 2015 at 11:49 AM, Thejas Nair <thejas.nair@gmail.com> wrote:
> 
> > Thanks for raising this issue. I agree that committer diversity is
> > important for long term success of a project. I think that should be a
> > criteria for graduation from incubator.
> > I think it is going to be more easier to find new contributors as an Apache
> > incubator project.
> >
> >
> > On Tue, Mar 10, 2015 at 9:09 AM, jan i <jani@apache.org> wrote:
> >
> > >
> > > +0 I am really concerned about the diversity of the initial committers,
> > > what happens if the university pulls the plug. I know we all say it will
> > > never happen, but it could happen.
> > >
> > > rgds
> > > jan i.
> > >
> > >
> > > On 10 March 2015 at 16:20, Alan Gates <alanfgates@gmail.com> wrote:
> > >
> > >> +1
> > >>
> > >> Alan.
> > >>
> > >>   Thejas Nair <thejas.nair@gmail.com>
> > >>  March 10, 2015 at 7:33
> > >> The Singa Incubator Proposal document has been updated based on
> > >> feedback in the proposal thread.
> > >>
> > >> This vote is proposing the inclusion of Apache Singa as incubator
> > project.
> > >> The vote will run for at least 72 hours.
> > >>
> > >> [ ] +1 Accept Apache Singa into the Incubator
> > >> [ ] +0 Don’t care.
> > >> [ ] -1 Don’t accept Apache Singa into the Incubator because..
> > >>
> > >> Please vote !
> > >>
> > >> Here is my +1 .
> > >>
> > >> Link to version of proposal being voted on :
> > >> https://wiki.apache.org/incubator/SingaProposal?action=recall&rev=10
> > >>
> > >> The text is below
> > >> ----------------------------------------------
> > >>
> > >> = Singa Incubator Proposal =
> > >> == Abstract ==
> > >> SINGA is a distributed deep learning platform.
> > >>
> > >> == Proposal ==
> > >> SINGA is an efficient, scalable and easy-to-use distributed platform
> > >> for training deep learning models, e.g., Deep Convolutional Neural
> > >> Network and
> > >> Deep Belief Network. It parallelizes the computation (i.e., training)
> > >> onto a
> > >> cluster of nodes by distributing the training data and model
> > >> automatically to
> > >> speed up the training. Built-in training algorithms like
> > Back-Propagation
> > >> and
> > >> Contrastive Divergence are implemented based on common abstractions of
> > >> deep
> > >> learning models. Users can train their own deep learning models by
> > simply
> > >> customizing these abstractions like implementing the Mapper and
> > >> Reducer in Hadoop.
> > >>
> > >> == Background ==
> > >> Deep learning refers to a set of feature (or representation) learning
> > >> models
> > >> that consist of multiple (non-linear) layers, where different layers
> > learn
> > >> different levels of abstractions (representations) of the raw input
> > data.
> > >> Larger (in terms of model parameters) and deeper (in terms of number of
> > >> layers)
> > >> models have shown better performance, e.g., lower image classification
> > >> error in
> > >> Large Scale Visual Recognition Challenge. However, a larger model
> > >> requires more
> > >> memory and larger training data to reduce over-fitting. Complex
> > >> numeric operations
> > >> make the training computation intensive. In practice, training large
> > >> deep learning
> > >> models takes weeks or months on a single node (even with GPU).
> > >>
> > >> == Rational ==
> > >> Deep learning has gained a lot of attraction in both academia and
> > >> industry due to
> > >> its success in a wide range of areas such as computer vision and
> > >> speech recognition.
> > >> However, training of such models is computationally expensive,
> > >> especially for large
> > >> and deep models (e.g., with billions of parameters and more than 10
> > >> layers). Both
> > >> Google and Microsoft have developed distributed deep learning systems
> > >> to make the
> > >> training more efficient by distributing the computations within a
> > >> cluster of nodes.
> > >> However, these systems are closed source softwares. Our goal is to
> > >> leverage the
> > >> community of open source developers to make SINGA efficient, scalable
> > >> and easy to
> > >> use. SINGA is a full fledged distributed platform, that could benefit
> > the
> > >> community and also benefit from the community in their involvement in
> > >> contributing
> > >> to the further work in this area. We believe the nature of SINGA and our
> > >> visions
> > >> for the system fit naturally to Apache's philosophy and development
> > >> framework.
> > >>
> > >> == Initial Goals ==
> > >> We have developed a system for SINGA running on a commodity computer
> > >> cluster. The initial goals include,
> > >> * improving the system in terms of scalability and efficiency, e.g.,
> > >> using Infiniband for network communication and multi-threading for one
> > >> node computation. We would consider extending SINGA to GPU clusters
> > >> later.
> > >> * benchmarking with larger datasets (hundreds of millions of training
> > >> instances) and models (billions of parameters).
> > >> * adding more built-in deep learning models. Users can train the
> > >> built-in models on their datasets directly.
> > >>
> > >>
> > >> == Current Status ==
> > >> === Meritocracy ===
> > >> We would like to follow ASF meritocratic principles to encourage more
> > >> developers
> > >> to contribute in this project. We know that only active and excellent
> > >> developers
> > >> can make SINGA a successful project. The committer list and PMC will be
> > >> updated
> > >> based on developers' performance and commitment. We are also improving
> > the
> > >> documentation and code to help new developers get started quickly.
> > >>
> > >> === Community ===
> > >> SINGA is currently being developed in the Database System Research Lab
> > at
> > >> the
> > >> National University of Singapore (NUS) in collaboration with Zhejiang
> > >> University in China.
> > >> Our lab has extensive experience in building database related systems,
> > >> including
> > >> distributed systems. Six PhD students and research assistants (Jinyang
> > >> Gao,
> > >> Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a
> > >> research
> > >> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian
> > >> Lee Tan)
> > >> have been working for a year on this project. We are open to recruiting
> > >> more
> > >> developers from diverse backgrounds.
> > >>
> > >> === Core Developers ===
> > >> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked
> > >> on
> > >> distributed systems for more than 20 years. They have collaborated with
> > >> the
> > >> industry and have built various large scale systems. Anh Dinh's research
> > >> is also
> > >> on distributed systems, albeit with more focus on security aspects. Wei
> > >> Wang's
> > >> research is on deep learning problems including deep learning
> > >> applications and
> > >> large scale training. Sheng Wang and Jinyang are working on efficient
> > >> indexing,
> > >> querying of large scale data and machine learning. Kaiping, Zhaojing and
> > >> Zhongle
> > >> are new PhD students who jointed SINGA recently. They will work on this
> > >> project
> > >> for a longer time (next 4-5 years). While we share common research
> > >> interests,
> > >> each member also brings diverse expertise to the team.
> > >>
> > >> === Alignment ===
> > >> ASF is already the home of many distributed platforms, e.g., Hadoop,
> > >> Spark and
> > >> Mahout, each of which targets a different application domain. SINGA,
> > >> being a
> > >> distributed platform for large-scale deep learning, focuses on another
> > >> important
> > >> domain for which there still lacks a robust and scalable open-source
> > >> platform.
> > >> The recent success of deep learning models especially for vision and
> > >> speech
> > >> recognition tasks has generated interests in both applying existing
> > >> deep learning
> > >> models and in developing new ones. Thus, an open-source platform for
> > deep
> > >> learning will be able to attract a large community of users and
> > >> developers.
> > >> SINGA is a complex system needing many iterations of design,
> > >> implementation and
> > >> testing. Apache's collaboration framework which encourages active
> > >> contribution
> > >> from developers will inevitably help improve the quality of the system,
> > >> as shown
> > >> in the success of Hadoop, Spark, etc.. Equally important is the
> > community
> > >> of
> > >> users which helps identify real-life applications of deep learning, and
> > >> helps
> > >> to evaluate the system's performance and ease-of-use. We hope to
> > >> leverage ASF for
> > >> coordinating and promoting both communities, and in return benefit the
> > >> communities
> > >> with another useful tool.
> > >>
> > >> == Known Risks ==
> > >> === Orphaned products ===
> > >> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave
> > the
> > >> lab in two to four years time. It is possible that some of them may
> > >> not have enough
> > >> time to focus on this project after that. But, SINGA is part of our
> > other
> > >> bigger
> > >> research projects on building an infrastructure for data intensive
> > >> applications,
> > >> which include health-care analytics and brain-inspired computing. Beng
> > >> Chin and
> > >> Kian Lee would continue working on it and getting more people
> > >> involved. For example,
> > >> three new developers (Kaiping, Zhaojing and Zhongle) joined us recently.
> > >> Individual developers are welcome to make SINGA a diverse community
> > >> that is robust and independent from any single developer.
> > >>
> > >> === Inexperience with Open Source ===
> > >> All the developers are active users and followers of open source
> > >> projects. Our
> > >> research lab has a strong commitment to open source, and has released
> > the
> > >> source
> > >> code of several systems under open source license as a way of
> > >> contributing back
> > >> to the open source community. But we do not have much real experience
> > >> in open source
> > >> projects with large and well organized communities like those in Apache.
> > >> This is
> > >> one reason we choose Apache which is experienced in open source
> > >> project incubation.
> > >> We hope to get the help from Apache (e.g., champion and mentors) to
> > >> establish a
> > >> healthy path for SINGA.
> > >>
> > >> === Homogenous Developers ===
> > >> Although the current developers are researchers in the universities,
> > they
> > >> have
> > >> different research interests and project experiences, as mentioned in
> > >> the section
> > >> that introduces the core developers. We know that a diverse community
> > >> is helpful.
> > >> Hence we are open to the idea of recruiting developers from other
> > >> regions and organizations.
> > >>
> > >> === Reliance on Salaried Developers ===
> > >> As a research project in the university, SINGA's current developing
> > >> community
> > >> consists of professors, PhD students, research assistants and
> > >> postdoctoral fellows.
> > >> They are driven by their interests to work on this project and have
> > >> contributed
> > >> actively since the start of the project. The research assistants and
> > >> fellows are
> > >> expected to leave when their contracts expire. However, they are keen
> > >> to continue
> > >> to work on the project voluntarily. Moreover, as a long term research
> > >> project, new
> > >> research assistants and fellows are likely to join the project.
> > >>
> > >> === A Excessive Fascination with the Apache Brand ===
> > >> We choose Apache not for publicity. We have two purposes. First, we want
> > >> to
> > >> leverage Apache's reputation to recruit more developers to make a
> > diverse
> > >> community. Second, we hope that Apache can help us to establish a
> > healthy
> > >> path
> > >> in developing SINGA. Beng Chin and Kian-Lee are established database and
> > >> distributed system researchers, and together with the other
> > contributors,
> > >> they
> > >> sincerely believe that there is a need for a widely accepted open source
> > >> distributed deep learning platform. The field of deep learning is still
> > >> at its
> > >> infancy, and an open source platform will fuel the research in the
> > >> area. Moreover,
> > >> such a platform will enable researchers to develop new models and
> > >> algorithms,
> > >> rather than spending time implementing a deep learning system from
> > >> scratch.
> > >> Furthermore, the need for scalability for such a platform is obvious.
> > >>
> > >> === Relationship with Other Apache Products ===
> > >> Apache Mahout and Apache Spark's ML-LIB are general machine learning
> > >> systems. Deep
> > >> learning algorithm can thus be implemented on these two platforms as
> > >> well. However, the there are differences in training efficiency,
> > >> scalability and
> > >> usability. Mahout and Spark ML-LIB follow models where their
> > >> nodes run synchronously. This is the fundamental difference to Singa who
> > >> follows the parameter server framework (like Google Brain and Microsoft
> > >> Adam). Singa can run synchronously or asynchronously. The asynchronous
> > >> mode
> > >> is superior than the synchronous mode in terms of scalability. In
> > >> addition, Singa has some optimizations towards deep learning models
> > >> (e.g., model
> > >> parallelism, data parallelism and hybrid-parallelism) which make Singa
> > >> more efficient. We also provide ease of use programming model for deep
> > >> learning algorithms.
> > >>
> > >> There are also plans for integration with Apache Hadoop's HDFS as
> > >> storage, to handle large training data.
> > >> Specifically, we store the training data (e.g., images or raw features
> > of
> > >> images) in HDFS, then (pre-)fetch them online.
> > >> We will also explore integration with Hadoop's Yarn and Apache Mesos
> > >> to do resource management.
> > >>
> > >>
> > >> == Documentation ==
> > >> The project is hosted at
> > >> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> > >> Documentations can be found at the Github Wiki Page:
> > >> https://github.com/nusinga/singa/wiki.
> > >> We continue to refine and improve the documentation.
> > >>
> > >> == Initial Source ==
> > >> We use Github to maintain our source code,
> > >> https://github.com/nusinga/singa
> > >>
> > >> == Source and Intellectual Property Submission Plan ==
> > >> We plan to make our code base be under Apache License, Version 2.0.
> > >>
> > >> == External Dependencies ==
> > >> * required by the core code base: glog, gflags, google protobuf,
> > >> open-blas, mpich, armci-mpi.
> > >> * required by data preparation and preprocessing: opencv, hdfs, python.
> > >>
> > >> == Cryptography ==
> > >> Not Applicable
> > >>
> > >> == Required Resources ==
> > >> === Mailing Lists ===
> > >> Currently, we use google group for internal discussion. The mailing
> > >> address is
> > >> nusinga@googlegroup.com. We will migrate the content to the apache
> > >> mailing
> > >> lists in the future.
> > >>
> > >> * singa-dev
> > >> * singa-user
> > >> * singa-commits
> > >> * singa-private (for private discussion within PCM)
> > >>
> > >> === Git Repository ===
> > >> We want to continue using git for version control. Hence, a git repo
> > >> is required.
> > >>
> > >> === Issue Tracking ===
> > >> JIRA Singa (SINGA)
> > >>
> > >> == Initial Committers ==
> > >> * Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> > >> * Kian Lee Tan (tankl @comp.nus.edu.sg)
> > >> * Gang Chen (cg @zju.edu.cn)
> > >> * Wei Wang (wangwei @comp.nus.edu.sg)
> > >> * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> > >> * Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> > >> * Sheng Wang (wangsh @comp.nus.edu.sg)
> > >> * Kaiping Zheng (kaiping @comp.nus.edu.sg)
> > >> * Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> > >> * Zhongle Xie (zhongle @comp.nus.edu.sg)
> > >>
> > >> == Affiliations ==
> > >> * Beng Chin Ooi, National University of Singapore
> > >> * Kian Lee Tan, National University of Singapore
> > >> * Gang Chen, Zhejiang University
> > >> * Wei Wang, National University of Singapore
> > >> * Dinh Tien Tuan Anh, National University of Singapore
> > >> * Jinyang Gao, National University of Singapore
> > >> * Sheng Wang, National University of Singapore
> > >> * Kaiping Zheng, National University of Singapore
> > >> * Zhaojing Luo, National University of Singapore
> > >> * Zhongle Xie, National University of Singapore
> > >>
> > >> == Sponsors ==
> > >> === Champion ===
> > >> Thejas Nair (thejas at apache.org)
> > >>
> > >> === Nominated Mentors ===
> > >> * Thejas Nair (thejas at apache.org)
> > >> * Alan Gates (gates at apache dot org)
> > >> * Daniel Dai (daijy at apache dot org)
> > >> * Ted Dunning (tdunning at apache dot org)
> > >>
> > >> === Sponsoring Entity ===
> > >> We are requesting the Incubator to sponsor this project.
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > >> For additional commands, e-mail: general-help@incubator.apache.org
> > >>
> > >>
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message