incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: [Fwd: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator]
Date Fri, 27 Feb 2015 16:06:08 GMT
Thejas,

Please add me as a mentor if it helps to have diversity.  I have enormous
trust based on previous experience with him that Alan Gates would act as a
highly impartial and effective mentor, but would be happy to help if there
is a concern that could be addressed by having another mentor from a
different company.



On Thu, Feb 26, 2015 at 6:12 PM, Thejas Nair <thejas.nair@gmail.com> wrote:

> The incubator proposal has been updated with the feedback so far.
> We have 3 mentors now, but I think it would be good to have additional
> mentors. Please let me know if anyone is able to help mentor this
> project.
>
> I am planning to start a vote on the proposal in a day or two.
>
>
> On Fri, Feb 6, 2015 at 5:21 PM,  <ooibc@comp.nus.edu.sg> wrote:
> >
> > Regarding the number of users using this project -- at this moment, the
> > community is not big.  A few local start-ups have been trying to use it
> > (mainly due to announcement in our seminar list), eg. one is using it for
> > image recognition (given a phone snapped by a user, it wants to be return
> > the same the product, and a list of similar products, such as a luxury
> bag
> > on a passerby).  Researchers from outside of NUS may have been using it
> > since we published an application paper on cross domain/modal retrieval
> in
> > VLDB 2014.
> >
> > We have not announced the project to the outside community yet -- we
> would
> > announce it in dbworld etc in due course.
> >
> > Thanks and have a good weekend.
> >
> > regards
> > beng chin
> >
> >>
> >> Thanks for the comments and suggestions.
> >> With permission from Thejas, I would like to respond to point 2.
> >>
> >> We have a huge team down at NUS (National University of Singapore) --
> >> we have about seven database/data mining data professors (not including
> >> those in systems, networking, and machine learning).
> >> I myself have nine PhD students in a steady state, and I have a few
> large
> >> grants, with a total budget of about 15 million S$ (~12 million USD),
> that
> >> allows me to hire a number of research fellows and research assistants
> for
> >> the next few years.  In a constant state, I have about 20 people (PhD
> >> students/RA/RF) working with me alone.  Other professors have their own
> >> grants (unlike other countries, it is relatively easy to get large
> grants
> >> in Singapore; many overseas Universities, including UIUC, MIT, ETH etc
> >> have research labs funded by Singapore Research Foundation [equivalent
> of
> >> NSF]).
> >>
> >> SINGA is a long term project for us -- while it is a platform as it is,
> we
> >> are using it for healthcare predictive analytics (by working with a
> >> hospital associated with the University).  Therefore, we will be working
> >> on SINGA, not solely as a distributed DL platform, but as a tool that
> will
> >> enable us to do data analytics on some business domains (eg. healthcase,
> >> consumer etc)
> >>
> >> For the initial set of committers, three are tenured professors, five
> are
> >> students, with 2-5 years to go before they complete their PhD.  Quite
> >> often, some would stay back as a research fellow for a couple of years
> >> before they start looking for a job outside.  We will work with mentors
> >> and new developers (from outside of NUS or Zhejiang University) in
> >> enhancing the system.
> >>
> >> The project should survive in that sense.
> >>
> >> (I have an on-going project CIIDAA that has been around since 2008; it
> was
> >> started as another project, epiC,  with a different grant, and then we
> >> continue the development with a new grant for CIIDAA --
> >> http://www.comp.nus.edu.sg/~ciidaa/
> >> )
> >>
> >> Thanks.
> >>
> >> regards
> >> beng chin
> >> ps: i am not sure if my email will get through to the group.
> >>
> >>
> >> ---------------------------- Original Message
> ----------------------------
> >> Subject: Re: [DISCUSS] [PROPOSAL] Singa for Apache Incubator
> >> From:    "Henry Saputra" <henry.saputra@gmail.com>
> >> Date:    Thu, February 5, 2015 2:57 pm
> >> To:      "general@incubator.apache.org" <general@incubator.apache.org>
> >> Cc:      ooibc@comp.nus.edu.sg
> >>
> --------------------------------------------------------------------------
> >>
> >> Several comments:
> >> -) How many users already using this project? I would reccomend to
> >> drop request for singa-user list at the beginning.
> >> -) All the initial committers come from university and seemed like
> >> some of them already ready to leave university. I am not too sure if
> >> this project go survive if all of the inital committers are from
> >> university as students.
> >> -) Need to solicit more mentors if this project ever get to Apache
> >> incubator.
> >>
> >> - Henry
> >>
> >> On Tue, Feb 3, 2015 at 3:58 PM, Thejas Nair <thejas.nair@gmail.com>
> wrote:
> >>> The "Relationship with Other Apache Products" section has been
> >>> updated. The reference to H2O in that section has been removed, and
> >>> other projects have been added.
> >>>  Thanks for the feedback!
> >>>
> >>>
> >>> On Wed, Jan 28, 2015 at 10:27 AM, Thejas Nair <thejas.nair@gmail.com>
> >> wrote:
> >>>> Thanks for pointing that out Henry! Yes, looks like H20 is not an
> >>>> apache project, I should have verified that.
> >>>> I will edit that, and revisit that section along with the folks in
> >>>> Singa community.
> >>>>
> >>>>
> >>>> On Tue, Jan 27, 2015 at 6:55 PM, Henry Saputra
> >> <henry.saputra@gmail.com> wrote:
> >>>>> Quick immediate comment that "Apache H2O" is not really Apache
> >>>>> project.
> >>>>>
> >>>>> I assume you are referring to https://github.com/h2oai/h2o (or
> >>>>> https://github.com/h2oai/h2o-dev) ?
> >>>>>
> >>>>> - Henry
> >>>>>
> >>>>> On Tue, Jan 27, 2015 at 5:29 PM, Thejas Nair <thejas.nair@gmail.com>
> >> wrote:
> >>>>>> Hello everyone,
> >>>>>>
> >>>>>> I would like to propose the inclusion of Singa as an Apache
> Incubator
> >> project.
> >>>>>>
> >>>>>> Here is the proposal -
> >>>>>> https://wiki.apache.org/incubator/SingaProposal
> >>>>>>
> >>>>>> Please review the proposal and give feedback. I am planning
to start
> >>>>>> a
> >>>>>> vote after 7 days if the proposal looks good.
> >>>>>> We are also seeking additional Apache mentors for the project.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Thejas
> >>>>>> ==========================================================
> >>>>>> Singa Incubator Proposal
> >>>>>>
> >>>>>> Abstract
> >>>>>>
> >>>>>> SINGA is a distributed deep learning platform.
> >>>>>>
> >>>>>> Proposal
> >>>>>>
> >>>>>> SINGA is an efficient, scalable and easy-to-use distributed
platform
> >>>>>> for training deep learning models, e.g., Deep Convolutional
Neural
> >>>>>> Network and Deep Belief Network. It parallelizes the computation
> >>>>>> (i.e., training) onto a cluster of nodes by distributing the
> training
> >>>>>> data and model automatically to speed up the training. Built-in
> >>>>>> training algorithms like Back-Propagation and Contrastive Divergence
> >>>>>> are implemented based on common abstractions of deep learning
> models.
> >>>>>> Users can train their own deep learning models by simply customizing
> >>>>>> these abstractions like implementing the Mapper and Reducer
in
> >>>>>> Hadoop.
> >>>>>>
> >>>>>> Background
> >>>>>>
> >>>>>> Deep learning refers to a set of feature (or representation)
> learning
> >>>>>> models that consist of multiple (non-linear) layers, where different
> >>>>>> layers learn different levels of abstractions (representations)
of
> >>>>>> the
> >>>>>> raw input data. Larger (in terms of model parameters) and deeper
(in
> >>>>>> terms of number of layers) models have shown better performance,
> >>>>>> e.g.,
> >>>>>> lower image classification error in Large Scale Visual Recognition
> >>>>>> Challenge. However, a larger model requires more memory and
larger
> >>>>>> training data to reduce over-fitting. Complex numeric operations
> make
> >>>>>> the training computation intensive. In practice, training large
deep
> >>>>>> learning models takes weeks or months on a single node (even
with
> >>>>>> GPU).
> >>>>>>
> >>>>>> Rational
> >>>>>>
> >>>>>> Deep learning has gained a lot of attraction in both academia
and
> >>>>>> industry due to its success in a wide range of areas such as
> computer
> >>>>>> vision and speech recognition. However, training of such models
is
> >>>>>> computationally expensive, especially for large and deep models
> >>>>>> (e.g.,
> >>>>>> with billions of parameters and more than 10 layers). Both Google
> and
> >>>>>> Microsoft have developed distributed deep learning systems to
make
> >>>>>> the
> >>>>>> training more efficient by distributing the computations within
a
> >>>>>> cluster of nodes. However, these systems are closed source
> softwares.
> >>>>>> Our goal is to leverage the community of open source developers
to
> >>>>>> make SINGA efficient, scalable and easy to use. SINGA is a full
> >>>>>> fledged distributed platform, that could benefit the community
and
> >>>>>> also benefit from the community in their involvement in contributing
> >>>>>> to the further work in this area. We believe the nature of SINGA
and
> >>>>>> our visions for the system fit naturally to Apache's philosophy
and
> >>>>>> development framework.
> >>>>>>
> >>>>>> Initial Goals
> >>>>>>
> >>>>>> We have developed a system for SINGA running on a commodity
computer
> >>>>>> cluster. The initial goals include, * improving the system in
terms
> >>>>>> of
> >>>>>> scalability and efficiency, e.g., using Infiniband for network
> >>>>>> communication and multi-threading for one node computation.
We would
> >>>>>> consider extending SINGA to GPU clusters later. * benchmarking
with
> >>>>>> larger datasets (hundreds of millions of training instances)
and
> >>>>>> models (billions of parameters). * adding more built-in deep
> learning
> >>>>>> models. Users can train the built-in models on their datasets
> >>>>>> directly.
> >>>>>>
> >>>>>> Current Status
> >>>>>>
> >>>>>> Meritocracy
> >>>>>>
> >>>>>> We would like to follow ASF meritocratic principles to encourage
> more
> >>>>>> developers to contribute in this project. We know that only
active
> >>>>>> and
> >>>>>> excellent developers can make SINGA a successful project. The
> >>>>>> committer list and PMC will be updated based on developers'
> >>>>>> performance and commitment. We are also improving the documentation
> >>>>>> and code to help new developers get started quickly.
> >>>>>>
> >>>>>> Community
> >>>>>>
> >>>>>> SINGA is currently being developed in the Database System Research
> >>>>>> Lab
> >>>>>> at the National University of Singapore (NUS) in collaboration
with
> >>>>>> Zhejiang University in China. Our lab has extensive experience
in
> >>>>>> building database related systems, including distributed systems.
> Six
> >>>>>> PhD students and research assistants (Jinyang Gao, Kaiping Zheng,
> >>>>>> Sheng Wang, Wei Wang, Zhaojing Luo and Zhongle Xie) , a research
> >>>>>> fellow (Anh Dinh) and three professors (Beng Chin Ooi, Gang
Chen,
> >>>>>> Kian
> >>>>>> Lee Tan) have been working for a year on this project. We are
open
> to
> >>>>>> recruiting more developers from diverse backgrounds.
> >>>>>>
> >>>>>> Core Developers
> >>>>>>
> >>>>>> Beng Chin Ooi, Gang Chen and Kian Lee Tan are professors who
have
> >>>>>> worked on distributed systems for more than 20 years. They have
> >>>>>> collaborated with the industry and have built various large
scale
> >>>>>> systems. Anh Dinh's research is also on distributed systems,
albeit
> >>>>>> with more focus on security aspects. Wei Wang's research is
on deep
> >>>>>> learning problems including deep learning applications and large
> >>>>>> scale
> >>>>>> training. Sheng Wang and Jinyang are working on efficient indexing,
> >>>>>> querying of large scale data and machine learning. Kaiping,
Zhaojing
> >>>>>> and Zhongle are new PhD students who jointed SINGA recently.
They
> >>>>>> will
> >>>>>> work on this project for a longer time (next 4-5 years). While
we
> >>>>>> share common research interests, each member also brings diverse
> >>>>>> expertise to the team.
> >>>>>>
> >>>>>> Alignment
> >>>>>>
> >>>>>> ASF is already the home of many distributed platforms, e.g.,
Hadoop,
> >>>>>> Spark and Mahout, each of which targets a different application
> >>>>>> domain. SINGA, being a distributed platform for large-scale
deep
> >>>>>> learning, focuses on another important domain for which there
still
> >>>>>> lacks a robust and scalable open-source platform. The recent
success
> >>>>>> of deep learning models especially for vision and speech recognition
> >>>>>> tasks has generated interests in both applying existing deep
> learning
> >>>>>> models and in developing new ones. Thus, an open-source platform
for
> >>>>>> deep learning will be able to attract a large community of users
and
> >>>>>> developers. SINGA is a complex system needing many iterations
of
> >>>>>> design, implementation and testing. Apache's collaboration framework
> >>>>>> which encourages active contribution from developers will inevitably
> >>>>>> help improve the quality of the system, as shown in the success
of
> >>>>>> Hadoop, Spark, etc.. Equally important is the community of users
> >>>>>> which
> >>>>>> helps identify real-life applications of deep learning, and
helps to
> >>>>>> evaluate the system's performance and ease-of-use. We hope to
> >>>>>> leverage
> >>>>>> ASF for coordinating and promoting both communities, and in
return
> >>>>>> benefit the communities with another useful tool.
> >>>>>>
> >>>>>> Known Risks
> >>>>>>
> >>>>>> Orphaned products
> >>>>>>
> >>>>>> Four core developers (Anh, Wei Wang, Jinyang and Sheng Wang)
may
> >>>>>> leave
> >>>>>> the lab in two to four years time. It is possible that some
of them
> >>>>>> may not have enough time to focus on this project after that.
But,
> >>>>>> SINGA is part of our other bigger research projects on building
an
> >>>>>> infrastructure for data intensive applications, which include
> >>>>>> health-care analytics and brain-inspired computing. Beng Chin
and
> >>>>>> Kian
> >>>>>> Lee would continue working on it and getting more people involved.
> >>>>>> For
> >>>>>> example, three new developers (Kaiping, Zhaojing and Zhongle)
joined
> >>>>>> us recently. Individual developers are welcome to make SINGA
a
> >>>>>> diverse
> >>>>>> community that is robust and independent from any single developer.
> >>>>>>
> >>>>>> Inexperience with Open Source
> >>>>>>
> >>>>>> All the developers are active users and followers of open source
> >>>>>> projects. Our research lab has a strong commitment to open source,
> >>>>>> and
> >>>>>> has released the source code of several systems under open source
> >>>>>> license as a way of contributing back to the open source community.
> >>>>>> But we do not have much real experience in open source projects
with
> >>>>>> large and well organized communities like those in Apache. This
is
> >>>>>> one
> >>>>>> reason we choose Apache which is experienced in open source
project
> >>>>>> incubation. We hope to get the help from Apache (e.g., champion
and
> >>>>>> mentors) to establish a healthy path for SINGA.
> >>>>>>
> >>>>>> Homogenous Developers
> >>>>>>
> >>>>>> Although the current developers are researchers in the universities,
> >>>>>> they have different research interests and project experiences,
as
> >>>>>> mentioned in the section that introduces the core developers.
We
> know
> >>>>>> that a diverse community is helpful. Hence we are open to the
idea
> of
> >>>>>> recruiting developers from other regions and organizations.
> >>>>>>
> >>>>>> Reliance on Salaried Developers
> >>>>>>
> >>>>>> As a research project in the university, SINGA's current developing
> >>>>>> community consists of professors, PhD students, research assistants
> >>>>>> and postdoctoral fellows. They are driven by their interests
to work
> >>>>>> on this project and have contributed actively since the start
of the
> >>>>>> project. The research assistants and fellows are expected to
leave
> >>>>>> when their contracts expire. However, they are keen to continue
to
> >>>>>> work on the project voluntarily. Moreover, as a long term research
> >>>>>> project, new research assistants and fellows are likely to join
the
> >>>>>> project.
> >>>>>>
> >>>>>> A Excessive Fascination with the Apache Brand
> >>>>>>
> >>>>>> We choose Apache not for publicity. We have two purposes. First,
we
> >>>>>> want to leverage Apache's reputation to recruit more developers
to
> >>>>>> make a diverse community. Second, we hope that Apache can help
us to
> >>>>>> establish a healthy path in developing SINGA. Beng Chin and
Kian-Lee
> >>>>>> are established database and distributed system researchers,
and
> >>>>>> together with the other contributors, they sincerely believe
that
> >>>>>> there is a need for a widely accepted open source distributed
deep
> >>>>>> learning platform. The field of deep learning is still at its
> >>>>>> infancy,
> >>>>>> and an open source platform will fuel the research in the area.
> >>>>>> Moreover, such a platform will enable researchers to develop
new
> >>>>>> models and algorithms, rather than spending time implementing
a deep
> >>>>>> learning system from scratch. Furthermore, the need for scalability
> >>>>>> for such a platform is obvious.
> >>>>>>
> >>>>>> Relationship with Other Apache Products
> >>>>>>
> >>>>>> Apache H2O implemented two simple deep learning models, namely
the
> >>>>>> Multi-Layer Perceptron and Deep Auto-encoders. There are two
> >>>>>> significant differences between H2O and SINGA. First, H2O adopts
the
> >>>>>> Map-Reduce framework which runs a set of computing nodes in
parallel
> >>>>>> againsts of the training set. Model parameters trained by all
> >>>>>> computing nodes are averaged as the final model parameters.
This
> >>>>>> training algorithm is different from the distributed training
> >>>>>> algorithm used by DistBelief, Adam and SINGA, which frequently
> >>>>>> synchronizes the parameters trained from different nodes. SINGA
> >>>>>> adopts
> >>>>>> the parameter server framework to support a wide range of
> distributed
> >>>>>> training algorithms and parallelization methods (e.g., data
> >>>>>> parallelism, model parallelism and hybrid parallelism. H2O only
> >>>>>> support data parallelism) . Second, in H2O, users are restricted
to
> >>>>>> use the two built-in models. In SINGA, we provide simple programming
> >>>>>> model to let users implement their own deep learning models.
A new
> >>>>>> deep learning model can be implemented by customizing the base
Layer
> >>>>>> class for each layer involved in the model. It is similar to
writing
> >>>>>> Hadoop programs where users only need to override the base Mapper
> and
> >>>>>> Reducer. We also provide built-in models for users to use directly.
> >>>>>>
> >>>>>> Documentation
> >>>>>>
> >>>>>> The project is hosted at
> >>>>>> http://www.comp.nus.edu.sg/~dbsystem/project/singa.html.
> >>>>>> Documentations can be found at the Github Wiki Page:
> >>>>>> https://github.com/nusinga/singa/wiki. We continue to refine
and
> >>>>>> improve the documentation.
> >>>>>>
> >>>>>> Initial Source
> >>>>>>
> >>>>>> We use Github to maintain our source code,
> >> https://github.com/nusinga/singa
> >>>>>>
> >>>>>> Source and Intellectual Property Submission Plan
> >>>>>>
> >>>>>> We plan to make our code base be under Apache License, Version
2.0.
> >>>>>>
> >>>>>> External Dependencies
> >>>>>>
> >>>>>> required by the core code base: glog, gflags, google protobuf,
> >>>>>> open-blas, mpich, armci-mpi.
> >>>>>> required by data preparation and preprocessing: opencv, hdfs,
> python.
> >>>>>>
> >>>>>> Cryptography
> >>>>>>
> >>>>>> Not Applicable
> >>>>>>
> >>>>>> Required Resources
> >>>>>>
> >>>>>> Mailing Lists
> >>>>>>
> >>>>>> Currently, we use google group for internal discussion. The
mailing
> >>>>>> address is nusinga@googlegroup.com. We will migrate the content
to
> >>>>>> the
> >>>>>> apache mailing lists in the future.
> >>>>>>
> >>>>>> singa-dev
> >>>>>> singa-user
> >>>>>> singa-commits
> >>>>>> singa-private (for private discussion within PCM)
> >>>>>>
> >>>>>> Git Repository
> >>>>>>
> >>>>>> We want to continue using git for version control. Hence, a
git repo
> >>>>>> is required.
> >>>>>>
> >>>>>> Issue Tracking
> >>>>>>
> >>>>>> JIRA Singa (SINGA)
> >>>>>>
> >>>>>> Initial Committers
> >>>>>>
> >>>>>> Beng Chin Ooi (ooibc @comp.nus.edu.sg)
> >>>>>> Kian Lee Tan (tankl @comp.nus.edu.sg)
> >>>>>> Gang Chen (cg @zju.edu.cn)
> >>>>>> Wei Wang (wangwei @comp.nus.edu.sg)
> >>>>>> Dinh Tien Tuan Anh (dinhtta @comp.nus.edu.sg)
> >>>>>> Jinyang Gao (jinyang.gao @comp.nus.edu.sg)
> >>>>>> Sheng Wang (wangsh @comp.nus.edu.sg)
> >>>>>> Kaiping Zheng (kaiping @comp.nus.edu.sg)
> >>>>>> Zhaojing Luo (zhaojing @comp.nus.edu.sg)
> >>>>>> Zhongle Xie (zhongle @comp.nus.edu.sg)
> >>>>>>
> >>>>>> Affiliations
> >>>>>>
> >>>>>> Beng Chin Ooi, National University of Singapore
> >>>>>> Kian Lee Tan, National University of Singapore
> >>>>>> Gang Chen, Zhejiang University
> >>>>>> Wei Wang, National University of Singapore
> >>>>>> Dinh Tien Tuan Anh, National University of Singapore
> >>>>>> Jinyang Gao, National University of Singapore
> >>>>>> Sheng Wang, National University of Singapore
> >>>>>> Kaiping Zheng, National University of Singapore
> >>>>>> Zhaojing Luo, National University of Singapore
> >>>>>> Zhongle Xie, National University of Singapore
> >>>>>>
> >>>>>> Sponsors
> >>>>>>
> >>>>>> Champion
> >>>>>>
> >>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
> >>>>>>
> >>>>>> Nominated Mentors
> >>>>>>
> >>>>>> Thejas Nair (thejas at apache.org) - Hortonworks
> >>>>>> Alan Gates (gates at apache dot org) - Hortonworks
> >>>>>> (Seeking more volunteers!)
> >>>>>>
> >>>>>> Sponsoring Entity
> >>>>>>
> >>>>>> We are requesting the Incubator to sponsor this project.
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>>>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>
> >>
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message