Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB2F217445 for ; Thu, 12 Mar 2015 16:27:07 +0000 (UTC) Received: (qmail 96076 invoked by uid 500); 12 Mar 2015 16:27:07 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 95881 invoked by uid 500); 12 Mar 2015 16:27:06 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 95869 invoked by uid 99); 12 Mar 2015 16:27:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Mar 2015 16:27:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of daijyc@gmail.com designates 209.85.215.43 as permitted sender) Received: from [209.85.215.43] (HELO mail-la0-f43.google.com) (209.85.215.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Mar 2015 16:26:39 +0000 Received: by lams18 with SMTP id s18so16997847lam.9 for ; Thu, 12 Mar 2015 09:25:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Oq1jAVUmdfQI9ioEe2tTzAKQ8DQOz/PSkW2Weq7CU/Y=; b=xv/oj0cn5zZE9fEOGD18SQ5jFSZFZg7trnTy+UiDjHGPw2200duaWH57pxM3dLuiXK +hdXRLUDE390T48NQl33OnrwoAWJk3E/iVhPPMz/kItjya2rWIEOWNXk3rZAgeOaSxkl pe22PtBY14TYLkWZY2X+iPK3q30ibKIVoIk/5KQZzpVGacqid468krJNtIGF5AfXfkNy Tn2RzLzYCJpFkeaOzLSp5BrxlDTTiC/vC3brosJ6iAU/0DujJWIMXAJpMrUtqzhl0W7E C4T4qbnirlI0GBMITu73js90ivSRYSv6WQPBZeUxApqURULGdhSx9IIYisfMM/8GjFQQ sLqw== MIME-Version: 1.0 X-Received: by 10.112.64.2 with SMTP id k2mr40432554lbs.54.1426177553392; Thu, 12 Mar 2015 09:25:53 -0700 (PDT) Received: by 10.25.209.144 with HTTP; Thu, 12 Mar 2015 09:25:53 -0700 (PDT) In-Reply-To: References: Date: Thu, 12 Mar 2015 09:25:53 -0700 Message-ID: Subject: Re: [VOTE] Accept Apache Singa as incubator project From: Daniel Dai To: general@incubator.apache.org Cc: ooibc@comp.nus.edu.sg Content-Type: multipart/alternative; boundary=001a11c3fc3c74ebba051119d6e7 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c3fc3c74ebba051119d6e7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable +1 On Tue, Mar 10, 2015 at 7:33 AM, Thejas Nair wrote: > The Singa Incubator Proposal document has been updated based on > feedback in the proposal thread. > > This vote is proposing the inclusion of Apache Singa as incubator project= . > The vote will run for at least 72 hours. > > [ ] +1 Accept Apache Singa into the Incubator > [ ] +0 Don=E2=80=99t care. > [ ] -1 Don=E2=80=99t accept Apache Singa into the Incubator because.. > > Please vote ! > > Here is my +1 . > > Link to version of proposal being voted on : > https://wiki.apache.org/incubator/SingaProposal?action=3Drecall&rev=3D10 > > The text is below > ---------------------------------------------- > > =3D Singa Incubator Proposal =3D > =3D=3D Abstract =3D=3D > SINGA is a distributed deep learning platform. > > =3D=3D Proposal =3D=3D > SINGA is an efficient, scalable and easy-to-use distributed platform > for training deep learning models, e.g., Deep Convolutional Neural Networ= k > and > Deep Belief Network. It parallelizes the computation (i.e., training) ont= o > a > cluster of nodes by distributing the training data and model automaticall= y > to > speed up the training. Built-in training algorithms like Back-Propagation > and > Contrastive Divergence are implemented based on common abstractions of de= ep > learning models. Users can train their own deep learning models by simply > customizing these abstractions like implementing the Mapper and > Reducer in Hadoop. > > =3D=3D Background =3D=3D > Deep learning refers to a set of feature (or representation) learning > models > that consist of multiple (non-linear) layers, where different layers lear= n > different levels of abstractions (representations) of the raw input data. > Larger (in terms of model parameters) and deeper (in terms of number of > layers) > models have shown better performance, e.g., lower image classification > error in > Large Scale Visual Recognition Challenge. However, a larger model require= s > more > memory and larger training data to reduce over-fitting. Complex > numeric operations > make the training computation intensive. In practice, training large > deep learning > models takes weeks or months on a single node (even with GPU). > > =3D=3D Rational =3D=3D > Deep learning has gained a lot of attraction in both academia and > industry due to > its success in a wide range of areas such as computer vision and > speech recognition. > However, training of such models is computationally expensive, > especially for large > and deep models (e.g., with billions of parameters and more than 10 > layers). Both > Google and Microsoft have developed distributed deep learning systems > to make the > training more efficient by distributing the computations within a > cluster of nodes. > However, these systems are closed source softwares. Our goal is to > leverage the > community of open source developers to make SINGA efficient, scalable > and easy to > use. SINGA is a full fledged distributed platform, that could benefit the > community and also benefit from the community in their involvement in > contributing > to the further work in this area. We believe the nature of SINGA and our > visions > for the system fit naturally to Apache's philosophy and development > framework. > > =3D=3D Initial Goals =3D=3D > We have developed a system for SINGA running on a commodity computer > cluster. The initial goals include, > * improving the system in terms of scalability and efficiency, e.g., > using Infiniband for network communication and multi-threading for one > node computation. We would consider extending SINGA to GPU clusters > later. > * benchmarking with larger datasets (hundreds of millions of training > instances) and models (billions of parameters). > * adding more built-in deep learning models. Users can train the > built-in models on their datasets directly. > > > =3D=3D Current Status =3D=3D > =3D=3D=3D Meritocracy =3D=3D=3D > We would like to follow ASF meritocratic principles to encourage more > developers > to contribute in this project. We know that only active and excellent > developers > can make SINGA a successful project. The committer list and PMC will be > updated > based on developers' performance and commitment. We are also improving th= e > documentation and code to help new developers get started quickly. > > =3D=3D=3D Community =3D=3D=3D > SINGA is currently being developed in the Database System Research Lab at > the > National University of Singapore (NUS) in collaboration with Zhejiang > University in China. > Our lab has extensive experience in building database related systems, > including > distributed systems. Six PhD students and research assistants (Jinyang Ga= o, > Kaiping Zheng, Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a > research > fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, Kian Le= e > Tan) > have been working for a year on this project. We are open to recruiting > more > developers from diverse backgrounds. > > =3D=3D=3D Core Developers =3D=3D=3D > Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have worked = on > distributed systems for more than 20 years. They have collaborated with t= he > industry and have built various large scale systems. Anh Dinh's research > is also > on distributed systems, albeit with more focus on security aspects. Wei > Wang's > research is on deep learning problems including deep learning application= s > and > large scale training. Sheng Wang and Jinyang are working on efficient > indexing, > querying of large scale data and machine learning. Kaiping, Zhaojing and > Zhongle > are new PhD students who jointed SINGA recently. They will work on this > project > for a longer time (next 4-5 years). While we share common research > interests, > each member also brings diverse expertise to the team. > > =3D=3D=3D Alignment =3D=3D=3D > ASF is already the home of many distributed platforms, e.g., Hadoop, Spar= k > and > Mahout, each of which targets a different application domain. SINGA, bein= g > a > distributed platform for large-scale deep learning, focuses on another > important > domain for which there still lacks a robust and scalable open-source > platform. > The recent success of deep learning models especially for vision and spee= ch > recognition tasks has generated interests in both applying existing > deep learning > models and in developing new ones. Thus, an open-source platform for deep > learning will be able to attract a large community of users and developer= s. > SINGA is a complex system needing many iterations of design, > implementation and > testing. Apache's collaboration framework which encourages active > contribution > from developers will inevitably help improve the quality of the system, a= s > shown > in the success of Hadoop, Spark, etc.. Equally important is the community > of > users which helps identify real-life applications of deep learning, and > helps > to evaluate the system's performance and ease-of-use. We hope to > leverage ASF for > coordinating and promoting both communities, and in return benefit the > communities > with another useful tool. > > =3D=3D Known Risks =3D=3D > =3D=3D=3D Orphaned products =3D=3D=3D > Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may leave th= e > lab in two to four years time. It is possible that some of them may > not have enough > time to focus on this project after that. But, SINGA is part of our other > bigger > research projects on building an infrastructure for data intensive > applications, > which include health-care analytics and brain-inspired computing. Beng > Chin and > Kian Lee would continue working on it and getting more people > involved. For example, > three new developers (Kaiping, Zhaojing and Zhongle) joined us recently. > Individual developers are welcome to make SINGA a diverse community > that is robust and independent from any single developer. > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > All the developers are active users and followers of open source projects= . > Our > research lab has a strong commitment to open source, and has released the > source > code of several systems under open source license as a way of contributin= g > back > to the open source community. But we do not have much real experience > in open source > projects with large and well organized communities like those in Apache. > This is > one reason we choose Apache which is experienced in open source > project incubation. > We hope to get the help from Apache (e.g., champion and mentors) to > establish a > healthy path for SINGA. > > =3D=3D=3D Homogenous Developers =3D=3D=3D > Although the current developers are researchers in the universities, they > have > different research interests and project experiences, as mentioned in > the section > that introduces the core developers. We know that a diverse community > is helpful. > Hence we are open to the idea of recruiting developers from other > regions and organizations. > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > As a research project in the university, SINGA's current developing > community > consists of professors, PhD students, research assistants and > postdoctoral fellows. > They are driven by their interests to work on this project and have > contributed > actively since the start of the project. The research assistants and > fellows are > expected to leave when their contracts expire. However, they are keen > to continue > to work on the project voluntarily. Moreover, as a long term research > project, new > research assistants and fellows are likely to join the project. > > =3D=3D=3D A Excessive Fascination with the Apache Brand =3D=3D=3D > We choose Apache not for publicity. We have two purposes. First, we want = to > leverage Apache's reputation to recruit more developers to make a diverse > community. Second, we hope that Apache can help us to establish a healthy > path > in developing SINGA. Beng Chin and Kian-Lee are established database and > distributed system researchers, and together with the other contributors, > they > sincerely believe that there is a need for a widely accepted open source > distributed deep learning platform. The field of deep learning is still a= t > its > infancy, and an open source platform will fuel the research in the > area. Moreover, > such a platform will enable researchers to develop new models and > algorithms, > rather than spending time implementing a deep learning system from scratc= h. > Furthermore, the need for scalability for such a platform is obvious. > > =3D=3D=3D Relationship with Other Apache Products =3D=3D=3D > Apache Mahout and Apache Spark's ML-LIB are general machine learning > systems. Deep > learning algorithm can thus be implemented on these two platforms as > well. However, the there are differences in training efficiency, > scalability and > usability. Mahout and Spark ML-LIB follow models where their > nodes run synchronously. This is the fundamental difference to Singa who > follows the parameter server framework (like Google Brain and Microsoft > Adam). Singa can run synchronously or asynchronously. The asynchronous mo= de > is superior than the synchronous mode in terms of scalability. In > addition, Singa has some optimizations towards deep learning models > (e.g., model > parallelism, data parallelism and hybrid-parallelism) which make Singa > more efficient. We also provide ease of use programming model for deep > learning algorithms. > > There are also plans for integration with Apache Hadoop's HDFS as > storage, to handle large training data. > Specifically, we store the training data (e.g., images or raw features of > images) in HDFS, then (pre-)fetch them online. > We will also explore integration with Hadoop's Yarn and Apache Mesos > to do resource management. > > > =3D=3D Documentation =3D=3D > The project is hosted at > http://www.comp.nus.edu.sg/~dbsystem/project/singa.html. > Documentations can be found at the Github Wiki Page: > https://github.com/nusinga/singa/wiki. > We continue to refine and improve the documentation. > > =3D=3D Initial Source =3D=3D > We use Github to maintain our source code, > https://github.com/nusinga/singa > > =3D=3D Source and Intellectual Property Submission Plan =3D=3D > We plan to make our code base be under Apache License, Version 2.0. > > =3D=3D External Dependencies =3D=3D > * required by the core code base: glog, gflags, google protobuf, > open-blas, mpich, armci-mpi. > * required by data preparation and preprocessing: opencv, hdfs, python. > > =3D=3D Cryptography =3D=3D > Not Applicable > > =3D=3D Required Resources =3D=3D > =3D=3D=3D Mailing Lists =3D=3D=3D > Currently, we use google group for internal discussion. The mailing > address is > nusinga@googlegroup.com. We will migrate the content to the apache mailin= g > lists in the future. > > * singa-dev > * singa-user > * singa-commits > * singa-private (for private discussion within PCM) > > =3D=3D=3D Git Repository =3D=3D=3D > We want to continue using git for version control. Hence, a git repo > is required. > > =3D=3D=3D Issue Tracking =3D=3D=3D > JIRA Singa (SINGA) > > =3D=3D Initial Committers =3D=3D > * Beng Chin Ooi (ooibc @comp.nus.edu.sg) > * Kian Lee Tan (tankl @comp.nus.edu.sg) > * Gang Chen (cg @zju.edu.cn) > * Wei Wang (wangwei @comp.nus.edu.sg) > * Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg) > * Jinyang Gao (jinyang.gao @comp.nus.edu.sg) > * Sheng Wang (wangsh @comp.nus.edu.sg) > * Kaiping Zheng (kaiping @comp.nus.edu.sg) > * Zhaojing Luo (zhaojing @comp.nus.edu.sg) > * Zhongle Xie (zhongle @comp.nus.edu.sg) > > =3D=3D Affiliations =3D=3D > * Beng Chin Ooi, National University of Singapore > * Kian Lee Tan, National University of Singapore > * Gang Chen, Zhejiang University > * Wei Wang, National University of Singapore > * Dinh Tien Tuan Anh, National University of Singapore > * Jinyang Gao, National University of Singapore > * Sheng Wang, National University of Singapore > * Kaiping Zheng, National University of Singapore > * Zhaojing Luo, National University of Singapore > * Zhongle Xie, National University of Singapore > > =3D=3D Sponsors =3D=3D > =3D=3D=3D Champion =3D=3D=3D > Thejas Nair (thejas at apache.org) > > =3D=3D=3D Nominated Mentors =3D=3D=3D > * Thejas Nair (thejas at apache.org) > * Alan Gates (gates at apache dot org) > * Daniel Dai (daijy at apache dot org) > * Ted Dunning (tdunning at apache dot org) > > =3D=3D=3D Sponsoring Entity =3D=3D=3D > We are requesting the Incubator to sponsor this project. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --001a11c3fc3c74ebba051119d6e7--