Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D2B117EFF for ; Sun, 8 Feb 2015 22:59:40 +0000 (UTC) Received: (qmail 74461 invoked by uid 500); 8 Feb 2015 22:59:39 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 74220 invoked by uid 500); 8 Feb 2015 22:59:39 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Delivered-To: moderator for general@incubator.apache.org Received: (qmail 37709 invoked by uid 99); 7 Feb 2015 01:21:49 -0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=LOTS_OF_MONEY,MILLION_USD,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) X-Virus-Scanned: amavisd-new at comp.nus.edu.sg Message-ID: In-Reply-To: References: Date: Sat, 7 Feb 2015 09:21:19 +0800 Subject: Re: [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator] From: ooibc@comp.nus.edu.sg To: "Henry Saputra" , thejas.nair@gmail.com Cc: general@incubator.apache.org User-Agent: SquirrelMail/1.4.22 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Checked: Checked by ClamAV on apache.org Regarding the number of users using this project -- at this moment, the community is not big. A few local start-ups have been trying to use it (mainly due to announcement in our seminar list), eg. one is using it for image recognition (given a phone snapped by a user, it wants to be return the same the product, and a list of similar products, such as a luxury bag on a passerby). Researchers from outside of NUS may have been using it since we published an application paper on cross domain/modal retrieval in VLDB 2014. We have not announced the project to the outside community yet -- we would announce it in dbworld etc in due course. Thanks and have a good weekend. regards beng chin > > Thanks for the comments and suggestions. > With permission from Thejas, I would like to respond to point 2. > > We have a huge team down at NUS (National University of Singapore) -- > we have about seven database/data mining data professors (not including > those in systems, networking, and machine learning). > I myself have nine PhD students in a steady state, and I have a few large > grants, with a total budget of about 15 million S$ (~12 million USD), that > allows me to hire a number of research fellows and research assistants for > the next few years. In a constant state, I have about 20 people (PhD > students/RA/RF) working with me alone. Other professors have their own > grants (unlike other countries, it is relatively easy to get large grants > in Singapore; many overseas Universities, including UIUC, MIT, ETH etc > have research labs funded by Singapore Research Foundation [equivalent of > NSF]). > > SINGA is a long term project for us -- while it is a platform as it is, we > are using it for healthcare predictive analytics (by working with a > hospital associated with the University). Therefore, we will be working > on SINGA, not solely as a distributed DL platform, but as a tool that will > enable us to do data analytics on some business domains (eg. healthcase, > consumer etc) > > For the initial set of committers, three are tenured professors, five are > students, with 2-5 years to go before they complete their PhD. Quite > often, some would stay back as a research fellow for a couple of years > before they start looking for a job outside. We will work with mentors > and new developers (from outside of NUS or Zhejiang University) in > enhancing the system. > > The project should survive in that sense. > > (I have an on-going project CIIDAA that has been around since 2008; it was > started as another project, epiC, with a different grant, and then we > continue the development with a new grant for CIIDAA -- > http://www.comp.nus.edu.sg/~ciidaa/ > ) > > Thanks. > > regards > beng chin > ps: i am not sure if my email will get through to the group. > > > ---------------------------- Original Message ---------------------------- > Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator > From: "Henry Saputra" > Date: Thu, February 5, 2015 2:57 pm > To: "general@incubator.apache.org" > Cc: ooibc@comp.nus.edu.sg > -------------------------------------------------------------------------- > > Several comments: > -) How many users already using this project? I would reccomend to > drop request for singa-user list at the beginning. > -) All the initial committers come from university and seemed like > some of them already ready to leave university. I am not too sure if > this project go survive if all of the inital committers are from > university as students. > -) Need to solicit more mentors if this project ever get to Apache > incubator. > > - Henry > > On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair wrote: >> The "Relationship with Other Apache Products" section has been >> updated. The reference to H2O in that section has been removed, and >> other projects have been added. >> Thanks for the feedback! >> >> >> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair > wrote: >>> Thanks for pointing that out Henry! Yes, looks like H20 is not an >>> apache project, I should have verified that. >>> I will edit that, and revisit that section along with the folks in >>> Singa community. >>> >>> >>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra > wrote: >>>> Quick immediate comment that "Apache H2O" is not really Apache >>>> project. >>>> >>>> I assume you are referring to https://github.com/h2oai/h2o (or >>>> https://github.com/h2oai/h2o-dev) ? >>>> >>>> - Henry >>>> >>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair > wrote: >>>>> Hello everyone, >>>>> >>>>> I would like to propose the inclusion of Singa as an Apache Incubator > project. >>>>> >>>>> Here is the proposal - >>>>> https://wiki.apache.org/incubator/SingaProposal >>>>> >>>>> Please review the proposal and give feedback. I am planning to start >>>>> a >>>>> vote after 7 days if the proposal looks good. >>>>> We are also seeking additional Apache mentors for the project. >>>>> >>>>> Thanks, >>>>> Thejas >>>>> ========================================================== >>>>> Singa Incubator Proposal >>>>> >>>>> Abstract >>>>> >>>>> SINGA is a distributed deep learning platform. >>>>> >>>>> Proposal >>>>> >>>>> SINGA is an efficient, scalable and easy-to-use distributed platform >>>>> for training deep learning models, e.g., Deep Convolutional Neural >>>>> Network and Deep Belief Network. It parallelizes the computation >>>>> (i.e., training) onto a cluster of nodes by distributing the training >>>>> data and model automatically to speed up the training. Built-in >>>>> training algorithms like Back-Propagation and Contrastive Divergence >>>>> are implemented based on common abstractions of deep learning models. >>>>> Users can train their own deep learning models by simply customizing >>>>> these abstractions like implementing the Mapper and Reducer in >>>>> Hadoop. >>>>> >>>>> Background >>>>> >>>>> Deep learning refers to a set of feature (or representation) learning >>>>> models that consist of multiple (non-linear) layers, where different >>>>> layers learn different levels of abstractions (representations) of >>>>> the >>>>> raw input data. Larger (in terms of model parameters) and deeper (in >>>>> terms of number of layers) models have shown better performance, >>>>> e.g., >>>>> lower image classification error in Large Scale Visual Recognition >>>>> Challenge. However, a larger model requires more memory and larger >>>>> training data to reduce over-fitting. Complex numeric operations make >>>>> the training computation intensive. In practice, training large deep >>>>> learning models takes weeks or months on a single node (even with >>>>> GPU). >>>>> >>>>> Rational >>>>> >>>>> Deep learning has gained a lot of attraction in both academia and >>>>> industry due to its success in a wide range of areas such as computer >>>>> vision and speech recognition. However, training of such models is >>>>> computationally expensive, especially for large and deep models >>>>> (e.g., >>>>> with billions of parameters and more than 10 layers). Both Google and >>>>> Microsoft have developed distributed deep learning systems to make >>>>> the >>>>> training more efficient by distributing the computations within a >>>>> cluster of nodes. However, these systems are closed source softwares. >>>>> Our goal is to leverage the community of open source developers to >>>>> make SINGA efficient, scalable and easy to use. SINGA is a full >>>>> fledged distributed platform, that could benefit the community and >>>>> also benefit from the community in their involvement in contributing >>>>> to the further work in this area. We believe the nature of SINGA and >>>>> our visions for the system fit naturally to Apache's philosophy and >>>>> development framework. >>>>> >>>>> Initial Goals >>>>> >>>>> We have developed a system for SINGA running on a commodity computer >>>>> cluster. The initial goals include, * improving the system in terms >>>>> of >>>>> scalability and efficiency, e.g., using Infiniband for network >>>>> communication and multi-threading for one node computation. We would >>>>> consider extending SINGA to GPU clusters later. * benchmarking with >>>>> larger datasets (hundreds of millions of training instances) and >>>>> models (billions of parameters). * adding more built-in deep learning >>>>> models. Users can train the built-in models on their datasets >>>>> directly. >>>>> >>>>> Current Status >>>>> >>>>> Meritocracy >>>>> >>>>> We would like to follow ASF meritocratic principles to encourage more >>>>> developers to contribute in this project. We know that only active >>>>> and >>>>> excellent developers can make SINGA a successful project. The >>>>> committer list and PMC will be updated based on developers' >>>>> performance and commitment. We are also improving the documentation >>>>> and code to help new developers get started quickly. >>>>> >>>>> Community >>>>> >>>>> SINGA is currently being developed in the Database System Research >>>>> Lab >>>>> at the National University of Singapore (NUS) in collaboration with >>>>> Zhejiang University in China. Our lab has extensive experience in >>>>> building database related systems, including distributed systems. Six >>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng, >>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research >>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang Chen, >>>>> Kian >>>>> Lee Tan) have been working for a year on this project. We are open to >>>>> recruiting more developers from diverse backgrounds. >>>>> >>>>> Core Developers >>>>> >>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who have >>>>> worked on distributed systems for more than 20 years. They have >>>>> collaborated with the industry and have built various large scale >>>>> systems. Anh Dinh's research is also on distributed systems, albeit >>>>> with more focus on security aspects. Wei Wang's research is on deep >>>>> learning problems including deep learning applications and large >>>>> scale >>>>> training. Sheng Wang and Jinyang are working on efficient indexing, >>>>> querying of large scale data and machine learning. Kaiping, Zhaojing >>>>> and Zhongle are new PhD students who jointed SINGA recently. They >>>>> will >>>>> work on this project for a longer time (next 4-5 years). While we >>>>> share common research interests, each member also brings diverse >>>>> expertise to the team. >>>>> >>>>> Alignment >>>>> >>>>> ASF is already the home of many distributed platforms, e.g., Hadoop, >>>>> Spark and Mahout, each of which targets a different application >>>>> domain. SINGA, being a distributed platform for large-scale deep >>>>> learning, focuses on another important domain for which there still >>>>> lacks a robust and scalable open-source platform. The recent success >>>>> of deep learning models especially for vision and speech recognition >>>>> tasks has generated interests in both applying existing deep learning >>>>> models and in developing new ones. Thus, an open-source platform for >>>>> deep learning will be able to attract a large community of users and >>>>> developers. SINGA is a complex system needing many iterations of >>>>> design, implementation and testing. Apache's collaboration framework >>>>> which encourages active contribution from developers will inevitably >>>>> help improve the quality of the system, as shown in the success of >>>>> Hadoop, Spark, etc.. Equally important is the community of users >>>>> which >>>>> helps identify real-life applications of deep learning, and helps to >>>>> evaluate the system's performance and ease-of-use. We hope to >>>>> leverage >>>>> ASF for coordinating and promoting both communities, and in return >>>>> benefit the communities with another useful tool. >>>>> >>>>> Known Risks >>>>> >>>>> Orphaned products >>>>> >>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang) may >>>>> leave >>>>> the lab in two to four years time. It is possible that some of them >>>>> may not have enough time to focus on this project after that. But, >>>>> SINGA is part of our other bigger research projects on building an >>>>> infrastructure for data intensive applications, which include >>>>> health-care analytics and brain-inspired computing. Beng Chin and >>>>> Kian >>>>> Lee would continue working on it and getting more people involved. >>>>> For >>>>> example, three new developers (Kaiping, Zhaojing and Zhongle) joined >>>>> us recently. Individual developers are welcome to make SINGA a >>>>> diverse >>>>> community that is robust and independent from any single developer. >>>>> >>>>> Inexperience with Open Source >>>>> >>>>> All the developers are active users and followers of open source >>>>> projects. Our research lab has a strong commitment to open source, >>>>> and >>>>> has released the source code of several systems under open source >>>>> license as a way of contributing back to the open source community. >>>>> But we do not have much real experience in open source projects with >>>>> large and well organized communities like those in Apache. This is >>>>> one >>>>> reason we choose Apache which is experienced in open source project >>>>> incubation. We hope to get the help from Apache (e.g., champion and >>>>> mentors) to establish a healthy path for SINGA. >>>>> >>>>> Homogenous Developers >>>>> >>>>> Although the current developers are researchers in the universities, >>>>> they have different research interests and project experiences, as >>>>> mentioned in the section that introduces the core developers. We know >>>>> that a diverse community is helpful. Hence we are open to the idea of >>>>> recruiting developers from other regions and organizations. >>>>> >>>>> Reliance on Salaried Developers >>>>> >>>>> As a research project in the university, SINGA's current developing >>>>> community consists of professors, PhD students, research assistants >>>>> and postdoctoral fellows. They are driven by their interests to work >>>>> on this project and have contributed actively since the start of the >>>>> project. The research assistants and fellows are expected to leave >>>>> when their contracts expire. However, they are keen to continue to >>>>> work on the project voluntarily. Moreover, as a long term research >>>>> project, new research assistants and fellows are likely to join the >>>>> project. >>>>> >>>>> A Excessive Fascination with the Apache Brand >>>>> >>>>> We choose Apache not for publicity. We have two purposes. First, we >>>>> want to leverage Apache's reputation to recruit more developers to >>>>> make a diverse community. Second, we hope that Apache can help us to >>>>> establish a healthy path in developing SINGA. Beng Chin and Kian-Lee >>>>> are established database and distributed system researchers, and >>>>> together with the other contributors, they sincerely believe that >>>>> there is a need for a widely accepted open source distributed deep >>>>> learning platform. The field of deep learning is still at its >>>>> infancy, >>>>> and an open source platform will fuel the research in the area. >>>>> Moreover, such a platform will enable researchers to develop new >>>>> models and algorithms, rather than spending time implementing a deep >>>>> learning system from scratch. Furthermore, the need for scalability >>>>> for such a platform is obvious. >>>>> >>>>> Relationship with Other Apache Products >>>>> >>>>> Apache H2O implemented two simple deep learning models, namely the >>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two >>>>> significant differences between H2O and SINGA. First, H2O adopts the >>>>> Map-Reduce framework which runs a set of computing nodes in parallel >>>>> againsts of the training set. Model parameters trained by all >>>>> computing nodes are averaged as the final model parameters. This >>>>> training algorithm is different from the distributed training >>>>> algorithm used by DistBelief, Adam and SINGA, which frequently >>>>> synchronizes the parameters trained from different nodes. SINGA >>>>> adopts >>>>> the parameter server framework to support a wide range of distributed >>>>> training algorithms and parallelization methods (e.g., data >>>>> parallelism, model parallelism and hybrid parallelism. H2O only >>>>> support data parallelism) . Second, in H2O, users are restricted to >>>>> use the two built-in models. In SINGA, we provide simple programming >>>>> model to let users implement their own deep learning models. A new >>>>> deep learning model can be implemented by customizing the base Layer >>>>> class for each layer involved in the model. It is similar to writing >>>>> Hadoop programs where users only need to override the base Mapper and >>>>> Reducer. We also provide built-in models for users to use directly. >>>>> >>>>> Documentation >>>>> >>>>> The project is hosted at >>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html. >>>>> Documentations can be found at the Github Wiki Page: >>>>> https://github.com/nusinga/singa/wiki. We continue to refine and >>>>> improve the documentation. >>>>> >>>>> Initial Source >>>>> >>>>> We use Github to maintain our source code, > https://github.com/nusinga/singa >>>>> >>>>> Source and Intellectual Property Submission Plan >>>>> >>>>> We plan to make our code base be under Apache License, Version 2.0. >>>>> >>>>> External Dependencies >>>>> >>>>> required by the core code base: glog, gflags, google protobuf, >>>>> open-blas, mpich, armci-mpi. >>>>> required by data preparation and preprocessing: opencv, hdfs, python. >>>>> >>>>> Cryptography >>>>> >>>>> Not Applicable >>>>> >>>>> Required Resources >>>>> >>>>> Mailing Lists >>>>> >>>>> Currently, we use google group for internal discussion. The mailing >>>>> address is nusinga@googlegroup.com. We will migrate the content to >>>>> the >>>>> apache mailing lists in the future. >>>>> >>>>> singa-dev >>>>> singa-user >>>>> singa-commits >>>>> singa-private (for private discussion within PCM) >>>>> >>>>> Git Repository >>>>> >>>>> We want to continue using git for version control. Hence, a git repo >>>>> is required. >>>>> >>>>> Issue Tracking >>>>> >>>>> JIRA Singa (SINGA) >>>>> >>>>> Initial Committers >>>>> >>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg) >>>>> Kian Lee Tan (tankl @comp.nus.edu.sg) >>>>> Gang Chen (cg @zju.edu.cn) >>>>> Wei Wang (wangwei @comp.nus.edu.sg) >>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg) >>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg) >>>>> Sheng Wang (wangsh @comp.nus.edu.sg) >>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg) >>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg) >>>>> Zhongle Xie (zhongle @comp.nus.edu.sg) >>>>> >>>>> Affiliations >>>>> >>>>> Beng Chin Ooi, National University of Singapore >>>>> Kian Lee Tan, National University of Singapore >>>>> Gang Chen, Zhejiang University >>>>> Wei Wang, National University of Singapore >>>>> Dinh Tien Tuan Anh, National University of Singapore >>>>> Jinyang Gao, National University of Singapore >>>>> Sheng Wang, National University of Singapore >>>>> Kaiping Zheng, National University of Singapore >>>>> Zhaojing Luo, National University of Singapore >>>>> Zhongle Xie, National University of Singapore >>>>> >>>>> Sponsors >>>>> >>>>> Champion >>>>> >>>>> Thejas Nair (thejas at apache.org) - Hortonworks >>>>> >>>>> Nominated Mentors >>>>> >>>>> Thejas Nair (thejas at apache.org) - Hortonworks >>>>> Alan Gates (gates at apache dot org) - Hortonworks >>>>> (Seeking more volunteers!) >>>>> >>>>> Sponsoring Entity >>>>> >>>>> We are requesting the Incubator to sponsor this project. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org >>>>> For additional commands, e-mail: general-help@incubator.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org >>>> For additional commands, e-mail: general-help@incubator.apache.org >>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org >> For additional commands, e-mail: general-help@incubator.apache.org >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org