Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7AF57DB4E for ; Fri, 15 Mar 2013 01:41:26 +0000 (UTC) Received: (qmail 84958 invoked by uid 500); 15 Mar 2013 01:41:25 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 84718 invoked by uid 500); 15 Mar 2013 01:41:24 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 84626 invoked by uid 99); 15 Mar 2013 01:41:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Mar 2013 01:41:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of srikanth.sundarrajan@inmobi.com designates 209.85.217.178 as permitted sender) Received: from [209.85.217.178] (HELO mail-lb0-f178.google.com) (209.85.217.178) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Mar 2013 01:41:18 +0000 Received: by mail-lb0-f178.google.com with SMTP id n1so2410605lba.9 for ; Thu, 14 Mar 2013 18:40:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=t+BnwzgOpVaj3eWEH1CUswfm/5Pyo28RxHmAw3vliAU=; b=RUeHBiEQzH1ER/6NFo1eE3sCVTQ+NbeJeXB7V7aKiVm39Sb10sZrEb0G2uHrTkNJHa wYIcsyfDoqAa5Gul71Angd1yYc5A++h0slj+UCNuEh/Fk5luAx/oAQLv5eZLlIN1ScrA fom2nq7ud89LNsz9t4SU+a2MeN5XQ7dVCUXlGtEC2RpNCZsjGF86fgkqruq+6UhWPs5C buMewUd7suwokZrAnzO3mQ8jOZ9jkxe8SVp4wxg1PA18GZrVZXVuHT9PSFo8oxIeUfTP rcQkaU9+XEZxYuMT/qqghtsTzgpqpTZG4LRUK4gPv8nZU75DTUjXOp6NHKRj+WcG+kOL +Pxg== MIME-Version: 1.0 X-Received: by 10.152.105.244 with SMTP id gp20mr4117631lab.34.1363311656329; Thu, 14 Mar 2013 18:40:56 -0700 (PDT) Received: by 10.114.2.70 with HTTP; Thu, 14 Mar 2013 18:40:56 -0700 (PDT) In-Reply-To: <52039F13-8184-4056-A51A-EE38D744E8ED@apache.org> References: <52039F13-8184-4056-A51A-EE38D744E8ED@apache.org> Date: Fri, 15 Mar 2013 07:10:56 +0530 Message-ID: Subject: Re: [PROPOSAL] Ivory - Hadoop data management and processing platform From: Srikanth Sundarrajan To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=f46d04083877fe7b9704d7ecba5d X-Gm-Message-State: ALoCoQlgqIBNo+1b2WvEhj2DBkqsgF7wnCis4bY1w5SIhW3KH5Sfn+AI7ZXYcrtCTWX2vjVanOwRNygfuIKbhTJZN+j1KM7mkCLG+KekObNObAlwx5h5hzU= X-Virus-Checked: Checked by ClamAV on apache.org --f46d04083877fe7b9704d7ecba5d Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Thanks. Yes, Git seems an attractive option for version control. Regards Srikanth Sundarrajan On Thu, Mar 14, 2013 at 6:26 AM, Vinod Kumar Vavilapalli < vinodkv@hortonworks.com> wrote: > > +1, this will be a great addition to the Hadoop eco-system! > > The proposal looks fine overall. I quickly searched around for the name > ivory, it looks to be a safe one, but someone needs to do due diligence? > > And I think you can chose to have git as the version control if you feel > like it. > > Thanks, > +Vinod Kumar Vavilapalli > > On Mar 13, 2013, at 10:00 AM, Srikanth Sundarrajan wrote: > > > =3D Ivory Proposal =3D > > > > =3D=3D Abstract =3D=3D > > Ivory is a data processing and management solution for Hadoop designed > for > > data motion, coordination of data pipelines, lifecycle management, and > > data discovery. Ivory enables end consumers to quickly onboard their da= ta > > and its associated processing and management tasks on Hadoop clusters. > > > > =3D=3D Proposal =3D=3D > > Ivory will enable easy data management via declarative mechanism for > > Hadoop. Users of Ivory platform simply define infrastructure endpoints, > > data sets and processing rules declaratively. These configurations > > are expressed in such a way that the dependencies between > > these entities are explicitly described. This information about > > inter-dependencies between various entities allows Ivory to orchestrate > and > > manage various data management functions. > > > > The key use cases that Ivory addresses are: > > * Data Motion > > * Process orchestration and scheduling > > * Policy-based Lifecycle Management > > * Data Discovery > > * Operability/Usability > > > > With these features it is possible for users to onboard their data sets > > with > > a comprehensive and holistic understanding of how, when and where their > > data > > is managed across its lifecycle. Complex functions such as retrying > > failures, > > identifying possible SLA breaches or automated handling of input data > > changes > > are now simple directives. All the administrative functions and user > level > > functions are available via RESTful APIs. CLI is simply a wrapper over > the > > RESTful APIs. > > > > =3D=3D Background =3D=3D > > Hadoop and its ecosystem of products have made storing and processing > > massive > > amounts of data commonplace. This has enabled numerous organizations to > > gain > > valuable insights that they never could have achieved in the past. Whil= e > it > > is easy to leverage Hadoop for crunching large volumes of data, > organizing > > data, managing life cycle of data and processing data is fairly involve= d. > > This is solved adequately well in a classic data platform involving dat= a > > warehouses and standard ETL (extract-transform-load) tools, but remains > > largely > > unsolved today. In addition to data processing complexities, Hadoop > > presents > > new sets of challenges and opportunities relating to management of data= . > > > > Data Management on Hadoop encompasses data motion, process orchestratio= n, > > lifecycle management, data discovery, etc. among other concerns that ar= e > > beyond > > ETL. Ivory is a new data processing and management platform for Hadoop > that > > solves this problem and creates additional opportunities by building on > > existing > > components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop > > DistCp > > etc.) without reinventing the wheel. Ivory has been in production at > > InMobi, > > going on its second year and has been managing hundreds of feeds and > > processes. > > > > Ivory is being developed by engineers employed with InMobi, Hortonworks > and > > Yahoo!. This platform addition will increase the adoption of Apache > Hadoop > > by > > driving data management tractable for end users. We are therefore > proposing > > to > > make Ivory an Apache open source project. > > > > =3D=3D Rationale =3D=3D > > The Ivory project aims to improve the usability of Apache Hadoop. As a > > result > > Apache Hadoop will grow its community of users by increasing the places > > Hadoop > > can be utilized and the use cases it will solve. By developing Ivory in > > Apache > > we hope to gather a diverse community of contributors, helping to ensur= e > > that > > Ivory is deployable for a broad range of scenarios. Members of the Hado= op > > development community will be able to influence Ivory=92s roadmap, and > > contribute > > to it. We believe having Ivory as part of the Apache Hadoop ecosystem > will > > be > > a great benefit to all of Hadoop's users. > > > > =3D=3D Current Status =3D=3D > > Ivory is widely deployed in production within InMobi and moving on to i= ts > > second year. A version with a valuable set of features is developed by > the > > list of initial committers and is hosted on github. > > > > =3D=3D=3D Meritocracy =3D=3D=3D > > Our intent with this incubator proposal is to start building a diverse > > developer > > community around Ivory following the Apache meritocracy model. We have > > wanted to > > make the project open source and encourage contributors from multiple > > organizations from the start. We plan to provide plenty of support to n= ew > > developers and to quickly recruit those who make solid contributions to > > committer status. > > > > =3D=3D=3D Community =3D=3D=3D > > We are happy to report that the initial team already represents multipl= e > > organizations. We hope to extend the user and developer base further in > the > > future and build a solid open source community around Ivory. > > > > =3D=3D=3D Core Developers =3D=3D=3D > > Ivory is currently being developed by three engineers from InMobi =96 > > Srikanth Sundarrajan, Shwetha G S, and Shaik Idris, two Hortonworks > > employees =96 > > Sanjay Radia and Venkatesh Seetharam. In addition, two Yahoo! employees= , > > Rohini Palaniswamy and Thiruvel Thirumoolan, are also involved. Srikant= h, > > Shwetha and Shaik are the original developers. All the engineers have > built > > two generations of Data Management on Hadoop, having deep expertise in > > Hadoop > > and are quite familiar with the Hadoop Ecosystem. > > > > =3D=3D=3D Alignment =3D=3D=3D > > The ASF is a natural host for Ivory given that it is already the home o= f > > Hadoop, > > Pig, Knox, HCatalog, and other emerging =93big data=94 software project= s. > Ivory > > has > > been designed to solve the data management challenges and opportunities > of > > the > > Hadoop ecosystem family of products. Ivory fills the gap that Hadoop > > ecosystem > > has been lacking in the areas of data processing and data lifecycle > > management. > > > > =3D=3D Known Risks =3D=3D > > > > =3D=3D=3D Orphaned products & Reliance on Salaried Developers =3D=3D=3D > > The core developers plan to work full time on the project. There is ver= y > > little > > risk of Ivory getting orphaned. Ivory is in use by companies we work fo= r > so > > the > > companies have an interest in its continued vitality. > > > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > > All of the core developers are active users and followers of open sourc= e. > > Srikanth Sundarrajan has been contributing patches to Apache Hadoop and > > Apache > > Oozie, Shwetha GS has been contributing patches to Apache Oozie. > > Seetharam Venkatesh is a committer on Apache Knox. Rohini Palaniswamy i= s > a > > committer on Apache PIG. Sharad Agarwal, Amareshwari SR (also a Apache > Hive > > PMC member) and Sanjay Radia are PMC members on Apache Hadoop. > > > > =3D=3D=3D Homogeneous Developers =3D=3D=3D > > The current core developers are from diverse set of organizations such = as > > InMobi, Hortonworks, and, Yahoo!. We expect to quickly establish a > > developer > > community that includes contributors from several corporations post > > incubation. > > > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > > Currently, most developers are paid to do work on Ivory but few are > > contributing > > in their spare time. However, once the project has a community built > around > > it > > post incubation, we expect to get committers and developers from outsid= e > > the > > current core developers. > > > > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > > Ivory is going to be used by the users of Hadoop and the Hadoop ecosyst= em > > in > > general. > > > > =3D=3D=3D A Excessive Fascination with the Apache Brand =3D=3D=3D > > While we respect the reputation of the Apache brand and have no doubts > that > > it > > will attract contributors and users, our interest is primarily to give > > Ivory a > > solid home as an open source project following an established developme= nt > > model. > > We have also given reasons in the Rationale and Alignment sections. > > > > =3D=3D Documentation =3D=3D > > There is documentation in github repository at: > > https://github.com/sriksun/Ivory > > > > =3D=3D Initial Source =3D=3D > > The source is currently in github repository at: > > https://github.com/sriksun/Ivory > > > > =3D=3D Source and Intellectual Property Submission Plan =3D=3D > > The complete Ivory code is under Apache Software License 2. > > > > =3D=3D External Dependencies =3D=3D > > The dependencies all have Apache compatible licenses. These include BSD= , > > MIT licensed dependencies. > > > > =3D=3D Cryptography =3D=3D > > None > > > > =3D=3D Required Resources =3D=3D > > > > =3D=3D=3D Mailing lists =3D=3D=3D > > > > * ivory-dev AT incubator DOT apache DOT org > > * ivory-commits AT incubator DOT apache DOT org > > * ivory-user AT incubator apache DOT org > > * ivory-private AT incubator DOT apache DOT org > > > > =3D=3D=3D Subversion Directory =3D=3D=3D > > https://svn.apache.org/repos/asf/incubator/ivory > > > > =3D=3D=3D Issue Tracking =3D=3D=3D > > JIRA IVORY > > > > =3D=3D Initial Committers =3D=3D > > * Srikanth Sundarrajan (Srikanth.Sundarrajan AT inmobi DOT com) > > * Shwetha GS (shwetha.gs AT inmobi DOT com) > > * Shaik Idris (shaik.idris AT inmobi DOT com) > > * Venkatesh Seetharam (Venkatesh AT apache DOT com) > > * Rohini Palaniswamy (rohinip AT yahoo-inc DOT com) > > * Thiruvel Thirumoolan (thiruvel AT yahoo-inc DOT com) > > * Sanjay Radia (sanjay AT apache DOT org) > > * Sharad Agarwal (sharad AT apache DOT org) > > * Amareshwari SR (amareshwari AT apache DOT org) > > > > =3D=3D Affiliations =3D=3D > > * Srikanth Sundarrajan (InMobi) > > * Shwetha GS (InMobi) > > * Shaik Idris (InMobi) > > * Venkatesh Seetharam (Hortonworks Inc) > > * Rohini Palaniswamy (Yahoo! Inc) > > * Thiruvel Thirumoolan (Yahoo! Inc) > > * Sanjay Radia (Hortonworks Inc) > > * Sharad Agarwal (InMobi) > > * Amareshwari SR (InMobi) > > > > =3D=3D Sponsors =3D=3D > > > > =3D=3D=3D Champion =3D=3D=3D > > * Arun C Murthy (acmurthy at apache dot org) > > > > =3D=3D=3D Nominated Mentors =3D=3D=3D > > * Alan Gates (gates AT apache DOT org) > > * Chris Douglas (cdouglas AT apache DOT org) > > * Devaraj Das (ddas AT apache DOT org) > > * Owen O=92Malley (omalley AT apache DOT org) > > > > =3D=3D=3D Sponsoring Entity =3D=3D=3D > > Incubator PMC > > > > -- > > _____________________________________________________________ > > The information contained in this communication is intended solely for > the > > use of the individual or entity to whom it is addressed and others > > authorized to receive it. It may contain confidential or legally > privileged > > information. If you are not the intended recipient you are hereby > notified > > that any disclosure, copying, distribution or taking any action in > reliance > > on the contents of this information is strictly prohibited and may be > > unlawful. If you have received this communication in error, please noti= fy > > us immediately by responding to this email and then delete it from your > > system. The firm is neither liable for the proper and complete > transmission > > of the information contained in this communication nor for any delay in > its > > receipt. > > --=20 _____________________________________________________________ The information contained in this communication is intended solely for the= =20 use of the individual or entity to whom it is addressed and others=20 authorized to receive it. It may contain confidential or legally privileged= =20 information. If you are not the intended recipient you are hereby notified= =20 that any disclosure, copying, distribution or taking any action in reliance= =20 on the contents of this information is strictly prohibited and may be=20 unlawful. If you have received this communication in error, please notify= =20 us immediately by responding to this email and then delete it from your=20 system. The firm is neither liable for the proper and complete transmission= =20 of the information contained in this communication nor for any delay in its= =20 receipt. --f46d04083877fe7b9704d7ecba5d--