incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sharad Agarwal <sha...@apache.org>
Subject Re: [PROPOSAL] Grill as new Incubator project
Date Fri, 19 Sep 2014 11:06:46 GMT
Chris, Thanks for your comments.

The differences that I see are:
- SciDB exposes Array Data model and Array Query Language (AQL). Grill data
model is based on OLAP Fact and Dimensions. Grill exposes SQL like language
(a subset of Hive QL) that works on *logical* entities (facts, dimensions)

- The goal of Grill is not to build a new query execution database, but to
unify them by having a central metadata catalog, and provide a Cube
abstraction layer on top of it.

Thanks,
Sharad

On Fri, Sep 19, 2014 at 9:34 AM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> This sounds super cool!
>
> How does this relate to SciDB? is it trying to do a similar thing?
>
> Cheers,
> Chris
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Sharad Agarwal <sharad@apache.org>
> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>,
> "sharad@apache.org" <sharad@apache.org>
> Date: Thursday, September 18, 2014 8:54 PM
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Subject: [PROPOSAL] Grill as new Incubator project
>
> >Grill Proposal
> >==========
> >
> ># Abstract
> >
> >Grill is a platform that enables multi-dimensional queries in a unified
> >way
> >over datasets stored in multiple warehouses. Grill integrates Apache Hive
> >with other data warehouses by tiering them together to form logical data
> >cubes.
> >
> >
> ># Proposal
> >
> >Grill provides a unified Cube abstraction for data stored in different
> >stores. Grill tiers multiple data warehouses for unified representation
> >and
> >efficient access. It provides SQL-like Cube query language to query and
> >describe data sets organized in data cubes. It enables users to run
> >queries
> >against Facts and Dimensions that can span multiple physical tables stored
> >in different stores.
> >
> >The primary use cases that Grill aims to solve:
> >- Facilitate analytical queries by providing the OLAP like Cube
> >abstraction
> >- Data Discovery by providing single metadata layer for data stored in
> >different stores
> >- Unified access to data by integrating Hive with other traditional data
> >warehouses
> >
> >
> ># Background
> >
> >Apache Hive is a data warehouse that facilitates querying and managing
> >large datasets stored in distributed storage systems like HDFS. It
> >provides
> >SQL like language called HiveQL aka HQL.  Apache Hive is a widely used
> >platform in various organizations for doing adhoc analytical queries.
> >In a typical Data warehouse scenario, the data is multi-dimensional and
> >organized into Facts and Dimensions to form Data Cubes. Grill provides
> >this
> >logical layer to enable querying and manage data as Cubes.
> >The Grill project is actively being developed at InMobi to provide the
> >higher level of analytical abstraction to query data stored in different
> >storages including Hive and beyond seamlessly.
> >
> >
> ># Rationale
> >
> >The Grill project aims to ease the analytical querying capabilities and
> >cut
> >the data-silos by providing a single view of data across multiple data
> >stores.
> >Conceiving data as a cube with hierarchical dimensions leads to
> >conceptually straightforward operations to facilitate analysis.
> >Integrating
> >Apache Hive with other traditional warehouses provides the opportunity to
> >optimize on the query execution cost by tiering the data across multiple
> >warehouses. Grill provides
> >- Access to data Cubes via Cube Query language similar to HiveQL.
> >- Driver based architecture to allow for plugging systems like Hive and
> >other warehouses such as columnar data RDBMS.
> >- Cost based engine selection that provides optimal use of resources by
> >selecting the best execution engine for a given query.
> >
> >In a typical Data warehouse, data is organized in Cubes with multiple
> >dimensions and measures. This facilitates the analysis by conceiving the
> >data in terms of Facts and Dimensions instead of physical tables. Grill
> >aims to provide this logical Cube abstraction on Data warehouses like Hive
> >and other traditional warehouses.
> >
> >
> ># Initial Goals
> >
> >- Donate the Grill source code and documentation to Apache Software
> >Foundation
> >- Build a user and developer community
> >- Support Hive and other Columnar data warehouses
> >- Support full query life cycle management
> >- Add authentication for querying cubes
> >- Provide detailed query statistics
> >
> >
> ># Long Term Goals
> >
> >Here are some longer-term capabilities that would be added to Grill
> >- Add authorization for managing and querying Cubes
> >- Provide REST and CLI for full Admin controls
> >- Capability to schedule queries
> >- Query caching
> >- Integrate with Apache Spark. Creating Spark RDD from Grill query
> >- Integrate with Apache Optiq
> >
> >
> ># Current Status
> >
> >The project is actively developed at InMobi. The first version is deployed
> >at InMobi 4 months back. This version allows querying dimension and fact
> >data stored in Hive over CLI. The source code and documentation is hosted
> >at GitHub.
> >
> >## Meritocracy
> >
> >We intend to build a diverse developer and user community for the project
> >following the Apache meritocracy model. We want to encourage contributors
> >from multiple organizations, provide plenty of support to new developers
> >and welcome them to be committers.
> >
> >## Community
> >
> >Currently the project is being developed at InMobi. We hope to extend our
> >contributor and user base significantly in the future and build a solid
> >open source community around Grill.
> >Core Developers
> >Grill is currently being developed by Amareshwari Sriramadasu, Sharad
> >Agarwal and Jaideep Dhok from InMobi, and Sreekanth Ramakrishnan who is
> >currently employed by SoftwareAG. Raghavendra Singh from InMobi has built
> >the QA automation for Grill.
> >
> >## Alignment
> >
> >The ASF is a natural home to Grill as it is for Apache Hadoop, Apache
> >Hive,
> >Apache Spark and other emerging projects in Big Data space.
> >We believe in any enterprise, multiple data warehouses will co-exist, as
> >not all workloads are cost effective to run on single one. Apache Hive is
> >one of the crucial data warehouse along with upcoming projects like Apache
> >Spark in Hadoop ecosystem. Grill will benefit in working in close
> >proximity
> >with these projects.
> >The traditional Columnar data warehouses complement Apache Hive as certain
> >workloads continue to be cost effective to run in traditional columnar
> >data
> >warehouses. Having multiple data warehouses leads to data silos that Grill
> >aims to cut within the enterprise and provide a holistic unified access to
> >data.
> >
> >
> ># Known Risks
> >
> >## Orphaned products & Reliance on Salaried Developers
> >
> >There is little risk of Grill getting orphaned, as Grill is key part of
> >the
> >Data Platform stack at InMobi. The core Grill developers plan to work on
> >it
> >full-time. We think Grill will bring value in the Big Data space and we
> >plan to grow the community of users and contributors.
> >
> >## Inexperience with Open Source
> >
> >All the core developers have long and significant experience in Apache
> >projects and Hadoop ecosystem. Amareshwari Sriramadasu has long standing
> >contributions to Apache Hadoop MapReduce and Apache Hive, she being PMC
> >member of Hadoop and a committer of Hive. Sharad Agarwal is a PMC member
> >of
> >Hadoop and contributed to Hadoop YARN and Hadoop MapReduce. Srikanth
> >Sundarrajan is a PMC member of Apache Falcon.  Sreekanth Ramakrishnan is
> >committer of Apache Hadoop.  Jaideep Dhok has contributed patches to
> >Apache
> >Hive. Gunther is a PMC member of Apache Hive. Vikram is a committer of
> >Apache Hive.
> >
> >## Homogeneous Developers
> >
> >The initial developers are employed by Hortonworks, InMobi and SoftwareAG.
> >We are committed to recruiting additional committers from other companies
> >based on their contribution to the project.
> >
> >## Reliance on Salaried Developers
> >
> >The majority of initial committers are paid by their employee to
> >contribute
> >to the project and few are contributing in their spare time. Once the
> >project has a community built, we are committed to recruit committers and
> >developers from outside the current core developers.
> >
> >## Relationships with Other Apache Products
> >
> >Grill is deeply integrated with other Apache projects. Grill uses and
> >extends Apache Hive HCatalog to store and manage the Data cubes. It uses
> >HDFS and Hive session management libraries. Grill has the driver-based
> >architecture that allows for adding multiple execution drivers. Apart from
> >integrating Apache Hive, it can be integrated with Apache Spark over Spark
> >SQL or Shark, Apache Drill, Apache Tajo and Apache Phoenix.
> >In future we want to use Apache Optiq in Grill for query optimization and
> >cost based driver selection.
> >
> >## An Excessive Fascination with the Apache Brand
> >
> >The project is conceived from beginning to be in line with the Apache
> >philosophy. As the core developers have good experience with Apache, the
> >source code organization, build, review and commit process are highly
> >influenced by Apache. We believe that Apache will be a solid home for
> >Grill
> >to grow and build the open source community. We have also described the
> >reasons in the Rationale and Alignment sections.
> >
> >
> ># Documentation
> >
> >http://inmobi.github.io/grill/
> >
> >
> ># Initial Source
> >
> >The source is currently in github repository at:
> >https://github.com/inmobi/grill
> >
> >
> ># Source and Intellectual Property Submission Plan
> >
> >The complete Grill code is already under Apache Software License 2.
> >
> >
> ># External Dependencies
> >
> >The dependencies all have Apache compatible licenses. These include Apache
> >2.0, BSD, MIT, EPL and CDDL licensed dependencies.
> >
> >
> ># Cryptography
> >
> >None
> >
> >
> ># Required Resources
> >
> >## Mailing lists
> >
> >grill-dev AT incubator DOT apache DOT org
> >grill-commits AT incubator DOT apache DOT org
> >grill-private AT incubator DOT apache DOT org
> >
> >## Subversion Directory
> >
> >Git is the preferred source control system: git://
> >git.apache.org/incubator-grill
> >
> >## Issue Tracking
> >
> >JIRA Grill (GRILL)
> >
> >
> ># Initial Committers
> >
> >Amareshwari Sriramadasu (amareshwari AT apache DOT org)
> >Gunther Hagleitner (gunther AT apache DOT org)
> >Jaideep Dhok (jaideep.dhok AT Inmobi DOT com)
> >Raghavendra Singh (raghavendra.singh AT Inmobi DOT com)
> >Sharad Agarwal (sharad AT apache DOT org)
> >Sreekanth Ramakrishnan (sreekanth AT apache DOT org)
> >Srikanth Sundarrajan (sriksun AT apache DOT org)
> >Suma Shivaprasad (suma.shivaprasad AT Inmobi DOT com)
> >Vikram Dixit (vikram AT apache DOT org)
> >
> >
> ># Affiliations
> >
> >Amareshwari SR (InMobi)
> >Gunther Hagleitner (Hortonworks)
> >Jaideep Dhok (InMobi)
> >Raghavendra Singh (InMobi)
> >Sharad Agarwal (InMobi)
> >Sreekanth Ramakrishnan (SoftwareAG)
> >Srikanth Sundarrajan (InMobi)
> >Suma Shivaprasad (InMobi)
> >Vikram Dixit (Hortonworks)
> >
> >
> ># Sponsors
> >
> >## Champion
> >
> >Vinod K <vinodkv AT apache DOT org> (Apache Member)
> >
> >## Nominated Mentors
> >
> >Chris Douglas (Microsoft)
> >Jacob Homan (Microsoft)
> >Jean Baptiste Onofre (Talend)
> >Vinod K (Hortonworks)
> >
> >## Sponsoring Entity
> >
> >Incubator PMC
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message