incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth Sundarrajan <srikanth.sundarra...@inmobi.com>
Subject Re: [PROPOSAL] Ivory - Hadoop data management and processing platform
Date Fri, 15 Mar 2013 01:40:56 GMT
Thanks. Yes, Git seems an attractive option for version control.

Regards
Srikanth Sundarrajan

On Thu, Mar 14, 2013 at 6:26 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

>
> +1, this will be a great addition to the Hadoop eco-system!
>
> The proposal looks fine overall. I quickly searched around for the name
> ivory, it looks to be a safe one, but someone needs to do due diligence?
>
> And I think you can chose to have git as the version control if you feel
> like it.
>
> Thanks,
> +Vinod Kumar Vavilapalli
>
> On Mar 13, 2013, at 10:00 AM, Srikanth Sundarrajan wrote:
>
> > = Ivory Proposal =
> >
> > == Abstract ==
> > Ivory is a data processing and management solution for Hadoop designed
> for
> > data motion, coordination of data pipelines, lifecycle management, and
> > data discovery. Ivory enables end consumers to quickly onboard their data
> > and its associated processing and management tasks on Hadoop clusters.
> >
> > == Proposal ==
> > Ivory will enable easy data management via declarative mechanism for
> > Hadoop. Users of Ivory platform simply define infrastructure endpoints,
> > data sets and processing rules declaratively. These configurations
> > are expressed in such a way that the dependencies between
> > these entities are explicitly described. This information about
> > inter-dependencies between various entities allows Ivory to orchestrate
> and
> > manage various data management functions.
> >
> > The key use cases that Ivory addresses are:
> > * Data Motion
> > * Process orchestration and scheduling
> > * Policy-based Lifecycle Management
> > * Data Discovery
> > * Operability/Usability
> >
> > With these features it is possible for users to onboard their data sets
> > with
> > a comprehensive and holistic understanding of how, when and where their
> > data
> > is managed across its lifecycle. Complex functions such as retrying
> > failures,
> > identifying possible SLA breaches or automated handling of input data
> > changes
> > are now simple directives. All the administrative functions and user
> level
> > functions are available via RESTful APIs. CLI is simply a wrapper over
> the
> > RESTful APIs.
> >
> > == Background ==
> > Hadoop and its ecosystem of products have made storing and processing
> > massive
> > amounts of data commonplace. This has enabled numerous organizations to
> > gain
> > valuable insights that they never could have achieved in the past. While
> it
> > is easy to leverage Hadoop for crunching large volumes of data,
> organizing
> > data, managing life cycle of data and processing data is fairly involved.
> > This is solved adequately well in a classic data platform involving data
> > warehouses and standard ETL (extract-transform-load) tools, but remains
> > largely
> > unsolved today. In addition to data processing complexities, Hadoop
> > presents
> > new sets of challenges and opportunities relating to management of data.
> >
> > Data Management on Hadoop encompasses data motion, process orchestration,
> > lifecycle management, data discovery, etc. among other concerns that are
> > beyond
> > ETL. Ivory is a new data processing and management platform for Hadoop
> that
> > solves this problem and creates additional opportunities by building on
> > existing
> > components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop
> > DistCp
> > etc.) without reinventing the wheel. Ivory has been in production at
> > InMobi,
> > going on its second year and has been managing hundreds of feeds and
> > processes.
> >
> > Ivory is being developed by engineers employed with InMobi, Hortonworks
> and
> > Yahoo!. This platform addition will increase the adoption of Apache
> Hadoop
> > by
> > driving data management tractable for end users. We are therefore
> proposing
> > to
> > make Ivory an Apache open source project.
> >
> > == Rationale ==
> > The Ivory project aims to improve the usability of Apache Hadoop. As a
> > result
> > Apache Hadoop will grow its community of users by increasing the places
> > Hadoop
> > can be utilized and the use cases it will solve. By developing Ivory in
> > Apache
> > we hope to gather a diverse community of contributors, helping to ensure
> > that
> > Ivory is deployable for a broad range of scenarios. Members of the Hadoop
> > development community will be able to influence Ivory’s roadmap, and
> > contribute
> > to it. We believe having Ivory as part of the Apache Hadoop ecosystem
> will
> > be
> > a great benefit to all of Hadoop's users.
> >
> > == Current Status ==
> > Ivory is widely deployed in production within InMobi and moving on to its
> > second year. A version with a valuable set of features is developed by
> the
> > list of initial committers and is hosted on github.
> >
> > === Meritocracy ===
> > Our intent with this incubator proposal is to start building a diverse
> > developer
> > community around Ivory following the Apache meritocracy model. We have
> > wanted to
> > make the project open source and encourage contributors from multiple
> > organizations from the start. We plan to provide plenty of support to new
> > developers and to quickly recruit those who make solid contributions to
> > committer status.
> >
> > === Community ===
> > We are happy to report that the initial team already represents multiple
> > organizations. We hope to extend the user and developer base further in
> the
> > future and build a solid open source community around Ivory.
> >
> > === Core Developers ===
> > Ivory is currently being developed by three engineers from InMobi –
> > Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks
> > employees –
> > Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees,
> > Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikanth,
> > Shwetha and Shaik are the original developers. All the engineers have
> built
> > two generations of Data Management on Hadoop, having deep expertise in
> > Hadoop
> > and are quite familiar with the Hadoop Ecosystem.
> >
> > === Alignment ===
> > The ASF is a natural host for Ivory given that it is already the home of
> > Hadoop,
> > Pig, Knox, HCatalog, and other emerging “big data” software projects.
> Ivory
> > has
> > been designed to solve the data management challenges and opportunities
> of
> > the
> > Hadoop ecosystem family of products. Ivory fills the gap that Hadoop
> > ecosystem
> > has been lacking in the areas of data processing and data lifecycle
> > management.
> >
> > == Known Risks ==
> >
> > === Orphaned products & Reliance on Salaried Developers ===
> > The core developers plan to work full time on the project. There is very
> > little
> > risk of Ivory getting orphaned. Ivory is in use by companies we work for
> so
> > the
> > companies have an interest in its continued vitality.
> >
> > === Inexperience with Open Source ===
> > All of the core developers are active users and followers of open source.
> > Srikanth Sundarrajan has been contributing patches to Apache Hadoop and
> > Apache
> > Oozie, Shwetha GS has been contributing patches to Apache Oozie.
> > Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy is
> a
> > committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache
> Hive
> > PMC member) and Sanjay Radia are PMC members on Apache Hadoop.
> >
> > === Homogeneous Developers ===
> > The current core developers are from diverse set of organizations such as
> > InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a
> > developer
> > community that includes contributors from several corporations post
> > incubation.
> >
> > === Reliance on Salaried Developers ===
> > Currently, most developers are paid to do work on Ivory but few are
> > contributing
> > in their spare time. However, once the project has a community built
> around
> > it
> > post incubation, we expect to get committers and developers from outside
> > the
> > current core developers.
> >
> > === Relationships with Other Apache Products ===
> > Ivory is going to be used by the users of Hadoop and the Hadoop ecosystem
> > in
> > general.
> >
> > === A Excessive Fascination with the Apache Brand ===
> > While we respect the reputation of the Apache brand and have no doubts
> that
> > it
> > will attract contributors and users, our interest is primarily to give
> > Ivory a
> > solid home as an open source project following an established development
> > model.
> > We have also given reasons in the Rationale and Alignment sections.
> >
> > == Documentation ==
> > There is documentation in github repository at:
> > https://github.com/sriksun/Ivory
> >
> > == Initial Source ==
> > The source is currently in github repository at:
> > https://github.com/sriksun/Ivory
> >
> > == Source and Intellectual Property Submission Plan ==
> > The complete Ivory code is under Apache Software License 2.
> >
> > == External Dependencies ==
> > The dependencies all have Apache compatible licenses. These include BSD,
> > MIT licensed dependencies.
> >
> > == Cryptography ==
> > None
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> > * ivory-dev AT incubator DOT apache DOT org
> > * ivory-commits AT incubator DOT apache DOT org
> > * ivory-user AT incubator apache DOT org
> > * ivory-private AT incubator DOT apache DOT org
> >
> > === Subversion Directory ===
> > https://svn.apache.org/repos/asf/incubator/ivory
> >
> > === Issue Tracking ===
> > JIRA IVORY
> >
> > == Initial Committers ==
> > * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com)
> > * Shwetha GS (shwetha.gs AT inmobi DOT com)
> > * Shaik Idris (shaik.idris AT inmobi DOT com)
> > * Venkatesh Seetharam (Venkatesh AT apache DOT com)
> > * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com)
> > * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com)
> > * Sanjay Radia (sanjay AT apache DOT org)
> > * Sharad Agarwal (sharad AT apache DOT org)
> > * Amareshwari SR (amareshwari AT apache DOT org)
> >
> > == Affiliations ==
> > * Srikanth Sundarrajan (InMobi)
> > * Shwetha GS (InMobi)
> > * Shaik Idris (InMobi)
> > * Venkatesh Seetharam (Hortonworks Inc)
> > * Rohini Palaniswamy (Yahoo! Inc)
> > * Thiruvel Thirumoolan (Yahoo! Inc)
> > * Sanjay Radia (Hortonworks Inc)
> > * Sharad Agarwal (InMobi)
> > * Amareshwari SR (InMobi)
> >
> > == Sponsors ==
> >
> > === Champion ===
> > * Arun C Murthy (acmurthy at apache dot org)
> >
> > === Nominated Mentors ===
> > * Alan Gates (gates AT apache DOT org)
> > * Chris Douglas (cdouglas AT apache DOT org)
> > * Devaraj  Das (ddas AT apache DOT org)
> > * Owen O’Malley (omalley AT apache DOT org)
> >
> > === Sponsoring Entity ===
> > Incubator PMC
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
>
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message