hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Ghai <sachin.g...@impetus.co.in>
Subject Proposal for an Apache Hama sub-project
Date Mon, 27 Feb 2017 08:16:11 GMT
Hama Community,

I would like to propose a sub-project for Apache Hama and initiate discussion around the proposal.
The proposed sub-project named 'Scalar' is a scalable orchestration, training and serving
system for machine learning and deep learning. Scalar would leverage Apache Hama to automate
the distributed training, model deployment and prediction serving.

More details about the proposal are listed below as per Apache project proposal template:
Abstract
Scalar is a general purpose framework for simplifying massive scale big data analytics and
deep learning modelling, deployment, serving with high performance.
Proposal
It is a goal of Scalar to provide an abstraction framework which allows user to easily scale
the functions of training a model, deploying a model and serving the prediction from underlying
machine learning or deep learning framework. It is also the characteristic of its execution
framework to orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache Hadoop,
Apache Spark and TensorFlow resources.
Background
The initial Scalar code was developed in 2016 and has been successfully beta tested for one
of the largest insurance organizations in a client specific PoC. The motivation behind this
work is to build a framework that provides abstraction on heterogeneous data science frameworks
and helps users leverage them in the most performant way.
Rationale
There is a sudden deluge of machine learning and deep learning frameworks in the industry.
As an application developer, it becomes a hard choice to switch from one framework to another
without rewriting the application. Also, there is additional plumbing to be done to retrieve
the prediction results for each model in different frameworks. We aim to provide an abstraction
framework which can be used to seamlessly train and deploy the model at scale on multiple
frameworks like TensorFlow, Apache Horn or Caffe. The abstraction further provides a unified
layer for serving the prediction in the most performant, scalable and efficient way for a
multi-tenant deployment. The key performance metrics will be reduction in training time, lower
error rate and lower latency time for serving models.
Scalar consists of a core engine which can be used to create flows described in terms of state,
sequences and algorithms. The engine invokes execution context of Apache Hama to train and
deploy models on target framework. Apache Hama is used for a variety of functions including
parameter tuning and scheduling computations on a distributed cluster. A data object layer
provides access to data from heterogeneous sources like HDFS, local, S3 etc. A REST API layer
is utilized for serving the prediction functions to client applications. A caching layer in
the middle acts as a latency improver for various functions.
Initial Goals
Some current goals include:

  *   Build community.
  *   Provide general purpose API for machine learning and deep learning training, deployment
and serving.
  *   Serve the predictions with low latency.
  *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and Caffe.
  *   Provide CPU and GPU support on-premise or on cloud to run the algorithms.
Current Status
Meritocracy
The core developers understand what it means to have a process based on meritocracy. We will
provide continuous efforts to build an environment that supports this, encouraging community
members to contribute.
Community
A small community has formed within the Apache Hama project community and companies such as
enterprise services and product company and artificial intelligence startup. There is a lot
of interest in data science serving systems and Artificial intelligence simplification systems.
By bringing Scalar into Apache, we believe that the community will grow even bigger.
Core Developers
Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni, Nikunj Limbaseeya,
Mayur Choubey
Known Risks
Orphaned Products
Apache Hama is already a core open source component being utilized at Samsung Electronics,
and Scalar is already getting adopted by major enterprise organizations. There is no direct
risk for Scalar project to be orphaned.
Inexperience with Open Source
All contributors have experience using and/or working on Apache open source projects.
Homogeneous Developers
The initial committers are from different organizations such as Impetus, Chalk Digital, and
Samsung Electronics.
Reliance on Salaried Developers
Few will be working as full-time open source developer. Other developers will also start working
on the project in their spare time.
Relationships with Other Apache Products

  *   Scalar is being built on top of Apache Hama
  *   Apache Spark is being used for machine learning.
  *   Apache Horn is being used for deep learning.
  *   The framework will run natively on Apache Hadoop and Apache Mesos.
An Excessive Fascination with the Apache Brand
Scalar itself will hopefully have benefits from Apache, in terms of attracting a community
and establishing a solid group of developers, but also the relation with Apache Hadoop, Spark
and Hama. These are the main reasons for us to send this proposal.
Documentation
Initial design of Scalar can be found at this link<https://drive.google.com/file/d/0B7mbLUemi6LFVHlFSzhONmZ4aU0/view?usp=sharing>.
Initial Source
Impetus Technologies (Impetus) will contribute the initial orchestration code base to create
this project. Impetus plans to contribute the Scalar code base, test cases, build files, and
documentation to the ASF under the terms specified in the ASF Corporate Contributor License
and further develop it with wider community. Once at Apache, the project will be licensed
under the ASF license.
Cryptography
Not applicable.
Required Resources
Mailing Lists

  *   scalar-dev
  *   scalar-pmc
Subversion Directory

  *   Git is the preferred source control system: git://git.apache.org/scalar
Issue Tracking

  *   a JIRA issue tracker, SCALAR
Initial Committers

  *   Sachin Ghai (sachin.ghai AT impetus DOT co DOT in)
  *   Edward J. Yoon (edwardyoon AT apache DOT org)
  *   Abhishek Soni (abhishek.soni AT impetus DOT co DOT in)
  *   Ishwardeep Singh ( ishwardeep AT chalkdigital DOT com )
  *   Nikunj Limbaseeya (nikunj.limbaseeya AT impetus DOT co DOT in)
  *   Rachna Gogia (rachna AT hadoopsphere DOT org)
  *   Mayur Choubey (mayur.choubey AT impetus DOT co DOT in)
Affiliations

  *   Sachin Ghai (Impetus)
  *   Edward J. Yoon (Samsung Electronics)
  *   Abhishek Soni (Impetus)
  *   Ishwardeep Singh ( Chalk Digital)
  *   Nikunj Limbaseeya (Impetus)
  *   Rachna Gogia (HadoopSphere)
  *   Mayur Choubey (Impetus)
Sponsors
<proposed>
Champion

  *   Edward J. Yoon <ASF member, Samsung Electronics >
Nominated Mentors

  *   Edward J. Yoon <ASF member, Samsung Electronics >
Sponsoring Entity
The Apache Hama project

-- End of proposal --

Thanks,
Sachin Ghai

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message