incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niall Pemberton <niall.pember...@gmail.com>
Subject Re: [PROPOSAL] Ivory - Hadoop data management and processing platform
Date Sat, 16 Mar 2013 00:33:16 GMT
+1

Niall

On Wed, Mar 13, 2013 at 5:00 PM, Srikanth Sundarrajan
<srikanth.sundarrajan@inmobi.com> wrote:
> = Ivory Proposal =
>
> == Abstract ==
> Ivory is a data processing and management solution for Hadoop designed for
> data motion, coordination of data pipelines, lifecycle management, and
> data discovery. Ivory enables end consumers to quickly onboard their data
> and its associated processing and management tasks on Hadoop clusters.
>
> == Proposal ==
> Ivory will enable easy data management via declarative mechanism for
> Hadoop. Users of Ivory platform simply define infrastructure endpoints,
> data sets and processing rules declaratively. These configurations
> are expressed in such a way that the dependencies between
> these entities are explicitly described. This information about
> inter-dependencies between various entities allows Ivory to orchestrate and
> manage various data management functions.
>
> The key use cases that Ivory addresses are:
>  * Data Motion
>  * Process orchestration and scheduling
>  * Policy-based Lifecycle Management
>  * Data Discovery
>  * Operability/Usability
>
> With these features it is possible for users to onboard their data sets
> with
> a comprehensive and holistic understanding of how, when and where their
> data
> is managed across its lifecycle. Complex functions such as retrying
> failures,
> identifying possible SLA breaches or automated handling of input data
> changes
> are now simple directives. All the administrative functions and user level
> functions are available via RESTful APIs. CLI is simply a wrapper over the
> RESTful APIs.
>
> == Background ==
> Hadoop and its ecosystem of products have made storing and processing
> massive
> amounts of data commonplace. This has enabled numerous organizations to
> gain
> valuable insights that they never could have achieved in the past. While it
> is easy to leverage Hadoop for crunching large volumes of data, organizing
> data, managing life cycle of data and processing data is fairly involved.
> This is solved adequately well in a classic data platform involving data
> warehouses and standard ETL (extract-transform-load) tools, but remains
> largely
> unsolved today. In addition to data processing complexities, Hadoop
> presents
> new sets of challenges and opportunities relating to management of data.
>
> Data Management on Hadoop encompasses data motion, process orchestration,
> lifecycle management, data discovery, etc. among other concerns that are
> beyond
> ETL. Ivory is a new data processing and management platform for Hadoop that
> solves this problem and creates additional opportunities by building on
> existing
> components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop
> DistCp
> etc.) without reinventing the wheel. Ivory has been in production at
> InMobi,
> going on its second year and has been managing hundreds of feeds and
> processes.
>
> Ivory is being developed by engineers employed with InMobi, Hortonworks and
> Yahoo!. This platform addition will increase the adoption of Apache Hadoop
> by
> driving data management tractable for end users. We are therefore proposing
> to
> make Ivory an Apache open source project.
>
> == Rationale ==
> The Ivory project aims to improve the usability of Apache Hadoop. As a
> result
> Apache Hadoop will grow its community of users by increasing the places
> Hadoop
> can be utilized and the use cases it will solve. By developing Ivory in
> Apache
> we hope to gather a diverse community of contributors, helping to ensure
> that
> Ivory is deployable for a broad range of scenarios. Members of the Hadoop
> development community will be able to influence Ivory’s roadmap, and
> contribute
> to it. We believe having Ivory as part of the Apache Hadoop ecosystem will
> be
> a great benefit to all of Hadoop's users.
>
> == Current Status ==
> Ivory is widely deployed in production within InMobi and moving on to its
> second year. A version with a valuable set of features is developed by the
> list of initial committers and is hosted on github.
>
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer
> community around Ivory following the Apache meritocracy model. We have
> wanted to
> make the project open source and encourage contributors from multiple
> organizations from the start. We plan to provide plenty of support to new
> developers and to quickly recruit those who make solid contributions to
> committer status.
>
> === Community ===
> We are happy to report that the initial team already represents multiple
> organizations. We hope to extend the user and developer base further in the
> future and build a solid open source community around Ivory.
>
> === Core Developers ===
> Ivory is currently being developed by three engineers from InMobi –
> Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> employees –
> Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees,
> Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikanth,
> Shwetha and Shaik are the original developers. All the engineers have built
> two generations of Data Management on Hadoop, having deep expertise in
> Hadoop
> and are quite familiar with the Hadoop Ecosystem.
>
> === Alignment ===
> The ASF is a natural host for Ivory given that it is already the home of
> Hadoop,
> Pig, Knox, HCatalog, and other emerging “big data” software projects. Ivory
> has
> been designed to solve the data management challenges and opportunities of
> the
> Hadoop ecosystem family of products. Ivory fills the gap that Hadoop
> ecosystem
> has been lacking in the areas of data processing and data lifecycle
> management.
>
> == Known Risks ==
>
> === Orphaned products & Reliance on Salaried Developers ===
> The core developers plan to work full time on the project. There is very
> little
> risk of Ivory getting orphaned. Ivory is in use by companies we work for so
> the
> companies have an interest in its continued vitality.
>
> === Inexperience with Open Source ===
> All of the core developers are active users and followers of open source.
> Srikanth Sundarrajan has been contributing patches to Apache Hadoop and
> Apache
> Oozie, Shwetha GS has been contributing patches to Apache Oozie.
> Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy is a
> committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache Hive
> PMC member) and Sanjay Radia are PMC members on Apache Hadoop.
>
> === Homogeneous Developers ===
> The current core developers are from diverse set of organizations such as
> InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a
> developer
> community that includes contributors from several corporations post
> incubation.
>
> === Reliance on Salaried Developers ===
> Currently, most developers are paid to do work on Ivory but few are
> contributing
> in their spare time. However, once the project has a community built around
> it
> post incubation, we expect to get committers and developers from outside
> the
> current core developers.
>
> === Relationships with Other Apache Products ===
> Ivory is going to be used by the users of Hadoop and the Hadoop ecosystem
> in
> general.
>
> === A Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it
> will attract contributors and users, our interest is primarily to give
> Ivory a
> solid home as an open source project following an established development
> model.
> We have also given reasons in the Rationale and Alignment sections.
>
> == Documentation ==
> There is documentation in github repository at:
> https://github.com/sriksun/Ivory
>
> == Initial Source ==
> The source is currently in github repository at:
> https://github.com/sriksun/Ivory
>
> == Source and Intellectual Property Submission Plan ==
> The complete Ivory code is under Apache Software License 2.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses. These include BSD,
> MIT licensed dependencies.
>
> == Cryptography ==
> None
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * ivory-dev AT incubator DOT apache DOT org
>  * ivory-commits AT incubator DOT apache DOT org
>  * ivory-user AT incubator apache DOT org
>  * ivory-private AT incubator DOT apache DOT org
>
> === Subversion Directory ===
> https://svn.apache.org/repos/asf/incubator/ivory
>
> === Issue Tracking ===
> JIRA IVORY
>
> == Initial Committers ==
>  * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
>  * Shwetha GS (shwetha.gs AT inmobi DOT com)
>  * Shaik Idris (shaik.idris AT inmobi DOT com)
>  * Venkatesh Seetharam (Venkatesh AT apache DOT com)
>  * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com)
>  * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com)
>  * Sanjay Radia (sanjay AT apache DOT org)
>  * Sharad Agarwal (sharad AT apache DOT org)
>  * Amareshwari SR (amareshwari AT apache DOT org)
>
> == Affiliations ==
>  * Srikanth Sundarrajan (InMobi)
>  * Shwetha GS (InMobi)
>  * Shaik Idris (InMobi)
>  * Venkatesh Seetharam (Hortonworks Inc)
>  * Rohini Palaniswamy (Yahoo! Inc)
>  * Thiruvel Thirumoolan (Yahoo! Inc)
>  * Sanjay Radia (Hortonworks Inc)
>  * Sharad Agarwal (InMobi)
>  * Amareshwari SR (InMobi)
>
> == Sponsors ==
>
> === Champion ===
>  * Arun C Murthy (acmurthy at apache dot org)
>
> === Nominated Mentors ===
>  * Alan Gates (gates AT apache DOT org)
>  * Chris Douglas (cdouglas AT apache DOT org)
>  * Devaraj  Das (ddas AT apache DOT org)
>  * Owen O’Malley (omalley AT apache DOT org)
>
> === Sponsoring Entity ===
> Incubator PMC
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message