incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [PROPOSAL] Grill as new Incubator project
Date Fri, 19 Sep 2014 04:04:31 GMT
This sounds super cool!

How does this relate to SciDB? is it trying to do a similar thing?

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Sharad Agarwal <sharad@apache.org>
Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>,
"sharad@apache.org" <sharad@apache.org>
Date: Thursday, September 18, 2014 8:54 PM
To: "general@incubator.apache.org" <general@incubator.apache.org>
Subject: [PROPOSAL] Grill as new Incubator project

>Grill Proposal
>==========
>
># Abstract
>
>Grill is a platform that enables multi-dimensional queries in a unified
>way
>over datasets stored in multiple warehouses. Grill integrates Apache Hive
>with other data warehouses by tiering them together to form logical data
>cubes.
>
>
># Proposal
>
>Grill provides a unified Cube abstraction for data stored in different
>stores. Grill tiers multiple data warehouses for unified representation
>and
>efficient access. It provides SQL-like Cube query language to query and
>describe data sets organized in data cubes. It enables users to run
>queries
>against Facts and Dimensions that can span multiple physical tables stored
>in different stores.
>
>The primary use cases that Grill aims to solve:
>- Facilitate analytical queries by providing the OLAP like Cube
>abstraction
>- Data Discovery by providing single metadata layer for data stored in
>different stores
>- Unified access to data by integrating Hive with other traditional data
>warehouses
>
>
># Background
>
>Apache Hive is a data warehouse that facilitates querying and managing
>large datasets stored in distributed storage systems like HDFS. It
>provides
>SQL like language called HiveQL aka HQL.  Apache Hive is a widely used
>platform in various organizations for doing adhoc analytical queries.
>In a typical Data warehouse scenario, the data is multi-dimensional and
>organized into Facts and Dimensions to form Data Cubes. Grill provides
>this
>logical layer to enable querying and manage data as Cubes.
>The Grill project is actively being developed at InMobi to provide the
>higher level of analytical abstraction to query data stored in different
>storages including Hive and beyond seamlessly.
>
>
># Rationale
>
>The Grill project aims to ease the analytical querying capabilities and
>cut
>the data-silos by providing a single view of data across multiple data
>stores.
>Conceiving data as a cube with hierarchical dimensions leads to
>conceptually straightforward operations to facilitate analysis.
>Integrating
>Apache Hive with other traditional warehouses provides the opportunity to
>optimize on the query execution cost by tiering the data across multiple
>warehouses. Grill provides
>- Access to data Cubes via Cube Query language similar to HiveQL.
>- Driver based architecture to allow for plugging systems like Hive and
>other warehouses such as columnar data RDBMS.
>- Cost based engine selection that provides optimal use of resources by
>selecting the best execution engine for a given query.
>
>In a typical Data warehouse, data is organized in Cubes with multiple
>dimensions and measures. This facilitates the analysis by conceiving the
>data in terms of Facts and Dimensions instead of physical tables. Grill
>aims to provide this logical Cube abstraction on Data warehouses like Hive
>and other traditional warehouses.
>
>
># Initial Goals
>
>- Donate the Grill source code and documentation to Apache Software
>Foundation
>- Build a user and developer community
>- Support Hive and other Columnar data warehouses
>- Support full query life cycle management
>- Add authentication for querying cubes
>- Provide detailed query statistics
>
>
># Long Term Goals
>
>Here are some longer-term capabilities that would be added to Grill
>- Add authorization for managing and querying Cubes
>- Provide REST and CLI for full Admin controls
>- Capability to schedule queries
>- Query caching
>- Integrate with Apache Spark. Creating Spark RDD from Grill query
>- Integrate with Apache Optiq
>
>
># Current Status
>
>The project is actively developed at InMobi. The first version is deployed
>at InMobi 4 months back. This version allows querying dimension and fact
>data stored in Hive over CLI. The source code and documentation is hosted
>at GitHub.
>
>## Meritocracy
>
>We intend to build a diverse developer and user community for the project
>following the Apache meritocracy model. We want to encourage contributors
>from multiple organizations, provide plenty of support to new developers
>and welcome them to be committers.
>
>## Community
>
>Currently the project is being developed at InMobi. We hope to extend our
>contributor and user base significantly in the future and build a solid
>open source community around Grill.
>Core Developers
>Grill is currently being developed by Amareshwari Sriramadasu, Sharad
>Agarwal and Jaideep Dhok from InMobi, and Sreekanth Ramakrishnan who is
>currently employed by SoftwareAG. Raghavendra Singh from InMobi has built
>the QA automation for Grill.
>
>## Alignment
>
>The ASF is a natural home to Grill as it is for Apache Hadoop, Apache
>Hive,
>Apache Spark and other emerging projects in Big Data space.
>We believe in any enterprise, multiple data warehouses will co-exist, as
>not all workloads are cost effective to run on single one. Apache Hive is
>one of the crucial data warehouse along with upcoming projects like Apache
>Spark in Hadoop ecosystem. Grill will benefit in working in close
>proximity
>with these projects.
>The traditional Columnar data warehouses complement Apache Hive as certain
>workloads continue to be cost effective to run in traditional columnar
>data
>warehouses. Having multiple data warehouses leads to data silos that Grill
>aims to cut within the enterprise and provide a holistic unified access to
>data.
>
>
># Known Risks
>
>## Orphaned products & Reliance on Salaried Developers
>
>There is little risk of Grill getting orphaned, as Grill is key part of
>the
>Data Platform stack at InMobi. The core Grill developers plan to work on
>it
>full-time. We think Grill will bring value in the Big Data space and we
>plan to grow the community of users and contributors.
>
>## Inexperience with Open Source
>
>All the core developers have long and significant experience in Apache
>projects and Hadoop ecosystem. Amareshwari Sriramadasu has long standing
>contributions to Apache Hadoop MapReduce and Apache Hive, she being PMC
>member of Hadoop and a committer of Hive. Sharad Agarwal is a PMC member
>of
>Hadoop and contributed to Hadoop YARN and Hadoop MapReduce. Srikanth
>Sundarrajan is a PMC member of Apache Falcon.  Sreekanth Ramakrishnan is
>committer of Apache Hadoop.  Jaideep Dhok has contributed patches to
>Apache
>Hive. Gunther is a PMC member of Apache Hive. Vikram is a committer of
>Apache Hive.
>
>## Homogeneous Developers
>
>The initial developers are employed by Hortonworks, InMobi and SoftwareAG.
>We are committed to recruiting additional committers from other companies
>based on their contribution to the project.
>
>## Reliance on Salaried Developers
>
>The majority of initial committers are paid by their employee to
>contribute
>to the project and few are contributing in their spare time. Once the
>project has a community built, we are committed to recruit committers and
>developers from outside the current core developers.
>
>## Relationships with Other Apache Products
>
>Grill is deeply integrated with other Apache projects. Grill uses and
>extends Apache Hive HCatalog to store and manage the Data cubes. It uses
>HDFS and Hive session management libraries. Grill has the driver-based
>architecture that allows for adding multiple execution drivers. Apart from
>integrating Apache Hive, it can be integrated with Apache Spark over Spark
>SQL or Shark, Apache Drill, Apache Tajo and Apache Phoenix.
>In future we want to use Apache Optiq in Grill for query optimization and
>cost based driver selection.
>
>## An Excessive Fascination with the Apache Brand
>
>The project is conceived from beginning to be in line with the Apache
>philosophy. As the core developers have good experience with Apache, the
>source code organization, build, review and commit process are highly
>influenced by Apache. We believe that Apache will be a solid home for
>Grill
>to grow and build the open source community. We have also described the
>reasons in the Rationale and Alignment sections.
>
>
># Documentation
>
>http://inmobi.github.io/grill/
>
>
># Initial Source
>
>The source is currently in github repository at:
>https://github.com/inmobi/grill
>
>
># Source and Intellectual Property Submission Plan
>
>The complete Grill code is already under Apache Software License 2.
>
>
># External Dependencies
>
>The dependencies all have Apache compatible licenses. These include Apache
>2.0, BSD, MIT, EPL and CDDL licensed dependencies.
>
>
># Cryptography
>
>None
>
>
># Required Resources
>
>## Mailing lists
>
>grill-dev AT incubator DOT apache DOT org
>grill-commits AT incubator DOT apache DOT org
>grill-private AT incubator DOT apache DOT org
>
>## Subversion Directory
>
>Git is the preferred source control system: git://
>git.apache.org/incubator-grill
>
>## Issue Tracking
>
>JIRA Grill (GRILL)
>
>
># Initial Committers
>
>Amareshwari Sriramadasu (amareshwari AT apache DOT org)
>Gunther Hagleitner (gunther AT apache DOT org)
>Jaideep Dhok (jaideep.dhok AT Inmobi DOT com)
>Raghavendra Singh (raghavendra.singh AT Inmobi DOT com)
>Sharad Agarwal (sharad AT apache DOT org)
>Sreekanth Ramakrishnan (sreekanth AT apache DOT org)
>Srikanth Sundarrajan (sriksun AT apache DOT org)
>Suma Shivaprasad (suma.shivaprasad AT Inmobi DOT com)
>Vikram Dixit (vikram AT apache DOT org)
>
>
># Affiliations
>
>Amareshwari SR (InMobi)
>Gunther Hagleitner (Hortonworks)
>Jaideep Dhok (InMobi)
>Raghavendra Singh (InMobi)
>Sharad Agarwal (InMobi)
>Sreekanth Ramakrishnan (SoftwareAG)
>Srikanth Sundarrajan (InMobi)
>Suma Shivaprasad (InMobi)
>Vikram Dixit (Hortonworks)
>
>
># Sponsors
>
>## Champion
>
>Vinod K <vinodkv AT apache DOT org> (Apache Member)
>
>## Nominated Mentors
>
>Chris Douglas (Microsoft)
>Jacob Homan (Microsoft)
>Jean Baptiste Onofre (Talend)
>Vinod K (Hortonworks)
>
>## Sponsoring Entity
>
>Incubator PMC


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message