incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: [PROPOSAL] Ivory - Hadoop data management and processing platform
Date Thu, 14 Mar 2013 00:56:22 GMT

+1, this will be a great addition to the Hadoop eco-system!

The proposal looks fine overall. I quickly searched around for the name ivory, it looks to
be a safe one, but someone needs to do due diligence?

And I think you can chose to have git as the version control if you feel like it.

Thanks,
+Vinod Kumar Vavilapalli

On Mar 13, 2013, at 10:00 AM, Srikanth Sundarrajan wrote:

> = Ivory Proposal =
> 
> == Abstract ==
> Ivory is a data processing and management solution for Hadoop designed for
> data motion, coordination of data pipelines, lifecycle management, and
> data discovery. Ivory enables end consumers to quickly onboard their data
> and its associated processing and management tasks on Hadoop clusters.
> 
> == Proposal ==
> Ivory will enable easy data management via declarative mechanism for
> Hadoop. Users of Ivory platform simply define infrastructure endpoints,
> data sets and processing rules declaratively. These configurations
> are expressed in such a way that the dependencies between
> these entities are explicitly described. This information about
> inter-dependencies between various entities allows Ivory to orchestrate and
> manage various data management functions.
> 
> The key use cases that Ivory addresses are:
> * Data Motion
> * Process orchestration and scheduling
> * Policy-based Lifecycle Management
> * Data Discovery
> * Operability/Usability
> 
> With these features it is possible for users to onboard their data sets
> with
> a comprehensive and holistic understanding of how, when and where their
> data
> is managed across its lifecycle. Complex functions such as retrying
> failures,
> identifying possible SLA breaches or automated handling of input data
> changes
> are now simple directives. All the administrative functions and user level
> functions are available via RESTful APIs. CLI is simply a wrapper over the
> RESTful APIs.
> 
> == Background ==
> Hadoop and its ecosystem of products have made storing and processing
> massive
> amounts of data commonplace. This has enabled numerous organizations to
> gain
> valuable insights that they never could have achieved in the past. While it
> is easy to leverage Hadoop for crunching large volumes of data, organizing
> data, managing life cycle of data and processing data is fairly involved.
> This is solved adequately well in a classic data platform involving data
> warehouses and standard ETL (extract-transform-load) tools, but remains
> largely
> unsolved today. In addition to data processing complexities, Hadoop
> presents
> new sets of challenges and opportunities relating to management of data.
> 
> Data Management on Hadoop encompasses data motion, process orchestration,
> lifecycle management, data discovery, etc. among other concerns that are
> beyond
> ETL. Ivory is a new data processing and management platform for Hadoop that
> solves this problem and creates additional opportunities by building on
> existing
> components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop
> DistCp
> etc.) without reinventing the wheel. Ivory has been in production at
> InMobi,
> going on its second year and has been managing hundreds of feeds and
> processes.
> 
> Ivory is being developed by engineers employed with InMobi, Hortonworks and
> Yahoo!. This platform addition will increase the adoption of Apache Hadoop
> by
> driving data management tractable for end users. We are therefore proposing
> to
> make Ivory an Apache open source project.
> 
> == Rationale ==
> The Ivory project aims to improve the usability of Apache Hadoop. As a
> result
> Apache Hadoop will grow its community of users by increasing the places
> Hadoop
> can be utilized and the use cases it will solve. By developing Ivory in
> Apache
> we hope to gather a diverse community of contributors, helping to ensure
> that
> Ivory is deployable for a broad range of scenarios. Members of the Hadoop
> development community will be able to influence Ivory’s roadmap, and
> contribute
> to it. We believe having Ivory as part of the Apache Hadoop ecosystem will
> be
> a great benefit to all of Hadoop's users.
> 
> == Current Status ==
> Ivory is widely deployed in production within InMobi and moving on to its
> second year. A version with a valuable set of features is developed by the
> list of initial committers and is hosted on github.
> 
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer
> community around Ivory following the Apache meritocracy model. We have
> wanted to
> make the project open source and encourage contributors from multiple
> organizations from the start. We plan to provide plenty of support to new
> developers and to quickly recruit those who make solid contributions to
> committer status.
> 
> === Community ===
> We are happy to report that the initial team already represents multiple
> organizations. We hope to extend the user and developer base further in the
> future and build a solid open source community around Ivory.
> 
> === Core Developers ===
> Ivory is currently being developed by three engineers from InMobi –
> Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> employees –
> Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees,
> Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikanth,
> Shwetha and Shaik are the original developers. All the engineers have built
> two generations of Data Management on Hadoop, having deep expertise in
> Hadoop
> and are quite familiar with the Hadoop Ecosystem.
> 
> === Alignment ===
> The ASF is a natural host for Ivory given that it is already the home of
> Hadoop,
> Pig, Knox, HCatalog, and other emerging “big data” software projects. Ivory
> has
> been designed to solve the data management challenges and opportunities of
> the
> Hadoop ecosystem family of products. Ivory fills the gap that Hadoop
> ecosystem
> has been lacking in the areas of data processing and data lifecycle
> management.
> 
> == Known Risks ==
> 
> === Orphaned products & Reliance on Salaried Developers ===
> The core developers plan to work full time on the project. There is very
> little
> risk of Ivory getting orphaned. Ivory is in use by companies we work for so
> the
> companies have an interest in its continued vitality.
> 
> === Inexperience with Open Source ===
> All of the core developers are active users and followers of open source.
> Srikanth Sundarrajan has been contributing patches to Apache Hadoop and
> Apache
> Oozie, Shwetha GS has been contributing patches to Apache Oozie.
> Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy is a
> committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache Hive
> PMC member) and Sanjay Radia are PMC members on Apache Hadoop.
> 
> === Homogeneous Developers ===
> The current core developers are from diverse set of organizations such as
> InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a
> developer
> community that includes contributors from several corporations post
> incubation.
> 
> === Reliance on Salaried Developers ===
> Currently, most developers are paid to do work on Ivory but few are
> contributing
> in their spare time. However, once the project has a community built around
> it
> post incubation, we expect to get committers and developers from outside
> the
> current core developers.
> 
> === Relationships with Other Apache Products ===
> Ivory is going to be used by the users of Hadoop and the Hadoop ecosystem
> in
> general.
> 
> === A Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it
> will attract contributors and users, our interest is primarily to give
> Ivory a
> solid home as an open source project following an established development
> model.
> We have also given reasons in the Rationale and Alignment sections.
> 
> == Documentation ==
> There is documentation in github repository at:
> https://github.com/sriksun/Ivory
> 
> == Initial Source ==
> The source is currently in github repository at:
> https://github.com/sriksun/Ivory
> 
> == Source and Intellectual Property Submission Plan ==
> The complete Ivory code is under Apache Software License 2.
> 
> == External Dependencies ==
> The dependencies all have Apache compatible licenses. These include BSD,
> MIT licensed dependencies.
> 
> == Cryptography ==
> None
> 
> == Required Resources ==
> 
> === Mailing lists ===
> 
> * ivory-dev AT incubator DOT apache DOT org
> * ivory-commits AT incubator DOT apache DOT org
> * ivory-user AT incubator apache DOT org
> * ivory-private AT incubator DOT apache DOT org
> 
> === Subversion Directory ===
> https://svn.apache.org/repos/asf/incubator/ivory
> 
> === Issue Tracking ===
> JIRA IVORY
> 
> == Initial Committers ==
> * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> * Shwetha GS (shwetha.gs AT inmobi DOT com)
> * Shaik Idris (shaik.idris AT inmobi DOT com)
> * Venkatesh Seetharam (Venkatesh AT apache DOT com)
> * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com)
> * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com)
> * Sanjay Radia (sanjay AT apache DOT org)
> * Sharad Agarwal (sharad AT apache DOT org)
> * Amareshwari SR (amareshwari AT apache DOT org)
> 
> == Affiliations ==
> * Srikanth Sundarrajan (InMobi)
> * Shwetha GS (InMobi)
> * Shaik Idris (InMobi)
> * Venkatesh Seetharam (Hortonworks Inc)
> * Rohini Palaniswamy (Yahoo! Inc)
> * Thiruvel Thirumoolan (Yahoo! Inc)
> * Sanjay Radia (Hortonworks Inc)
> * Sharad Agarwal (InMobi)
> * Amareshwari SR (InMobi)
> 
> == Sponsors ==
> 
> === Champion ===
> * Arun C Murthy (acmurthy at apache dot org)
> 
> === Nominated Mentors ===
> * Alan Gates (gates AT apache DOT org)
> * Chris Douglas (cdouglas AT apache DOT org)
> * Devaraj  Das (ddas AT apache DOT org)
> * Owen O’Malley (omalley AT apache DOT org)
> 
> === Sponsoring Entity ===
> Incubator PMC
> 
> -- 
> _____________________________________________________________
> The information contained in this communication is intended solely for the 
> use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally privileged 
> information. If you are not the intended recipient you are hereby notified 
> that any disclosure, copying, distribution or taking any action in reliance 
> on the contents of this information is strictly prohibited and may be 
> unlawful. If you have received this communication in error, please notify 
> us immediately by responding to this email and then delete it from your 
> system. The firm is neither liable for the proper and complete transmission 
> of the information contained in this communication nor for any delay in its 
> receipt.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message