incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [PROPOSAL] Grill as new Incubator project
Date Fri, 19 Sep 2014 14:24:45 GMT
Thank you Sharad. So I could use this system for remote sensing
data, like 3-dimension (time, space, and measurement) type of cubes?
Does it support numerical data well?

Sorry for so many questions just excited :)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Sharad Agarwal <sharad@apache.org>
Reply-To: "sharad@apache.org" <sharad@apache.org>
Date: Friday, September 19, 2014 4:06 AM
To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
Cc: "general@incubator.apache.org" <general@incubator.apache.org>
Subject: Re: [PROPOSAL] Grill as new Incubator project

>Chris, Thanks for your comments.
>
>
>The differences that I see are:
>- SciDB exposes Array Data model and Array Query Language (AQL). Grill
>data model is based on OLAP Fact and Dimensions. Grill exposes SQL like
>language (a subset of Hive QL) that works on *logical* entities (facts,
>dimensions)
>
>
>- The goal of Grill is not to build a new query execution database, but
>to unify them by having a central metadata catalog, and provide a Cube
>abstraction layer on top of it.
>
>
>
>Thanks,
>Sharad
>
>
>On Fri, Sep 19, 2014 at 9:34 AM, Mattmann, Chris A (3980)
><chris.a.mattmann@jpl.nasa.gov> wrote:
>
>This sounds super cool!
>
>How does this relate to SciDB? is it trying to do a similar thing?
>
>Cheers,
>Chris
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Sharad Agarwal <sharad@apache.org>
>Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>,
>"sharad@apache.org" <sharad@apache.org>
>Date: Thursday, September 18, 2014 8:54 PM
>To: "general@incubator.apache.org" <general@incubator.apache.org>
>Subject: [PROPOSAL] Grill as new Incubator project
>
>>Grill Proposal
>>==========
>>
>># Abstract
>>
>>Grill is a platform that enables multi-dimensional queries in a unified
>>way
>>over datasets stored in multiple warehouses. Grill integrates Apache Hive
>>with other data warehouses by tiering them together to form logical data
>>cubes.
>>
>>
>># Proposal
>>
>>Grill provides a unified Cube abstraction for data stored in different
>>stores. Grill tiers multiple data warehouses for unified representation
>>and
>>efficient access. It provides SQL-like Cube query language to query and
>>describe data sets organized in data cubes. It enables users to run
>>queries
>>against Facts and Dimensions that can span multiple physical tables
>>stored
>>in different stores.
>>
>>The primary use cases that Grill aims to solve:
>>- Facilitate analytical queries by providing the OLAP like Cube
>>abstraction
>>- Data Discovery by providing single metadata layer for data stored in
>>different stores
>>- Unified access to data by integrating Hive with other traditional data
>>warehouses
>>
>>
>># Background
>>
>>Apache Hive is a data warehouse that facilitates querying and managing
>>large datasets stored in distributed storage systems like HDFS. It
>>provides
>>SQL like language called HiveQL aka HQL.  Apache Hive is a widely used
>>platform in various organizations for doing adhoc analytical queries.
>>In a typical Data warehouse scenario, the data is multi-dimensional and
>>organized into Facts and Dimensions to form Data Cubes. Grill provides
>>this
>>logical layer to enable querying and manage data as Cubes.
>>The Grill project is actively being developed at InMobi to provide the
>>higher level of analytical abstraction to query data stored in different
>>storages including Hive and beyond seamlessly.
>>
>>
>># Rationale
>>
>>The Grill project aims to ease the analytical querying capabilities and
>>cut
>>the data-silos by providing a single view of data across multiple data
>>stores.
>>Conceiving data as a cube with hierarchical dimensions leads to
>>conceptually straightforward operations to facilitate analysis.
>>Integrating
>>Apache Hive with other traditional warehouses provides the opportunity to
>>optimize on the query execution cost by tiering the data across multiple
>>warehouses. Grill provides
>>- Access to data Cubes via Cube Query language similar to HiveQL.
>>- Driver based architecture to allow for plugging systems like Hive and
>>other warehouses such as columnar data RDBMS.
>>- Cost based engine selection that provides optimal use of resources by
>>selecting the best execution engine for a given query.
>>
>>In a typical Data warehouse, data is organized in Cubes with multiple
>>dimensions and measures. This facilitates the analysis by conceiving the
>>data in terms of Facts and Dimensions instead of physical tables. Grill
>>aims to provide this logical Cube abstraction on Data warehouses like
>>Hive
>>and other traditional warehouses.
>>
>>
>># Initial Goals
>>
>>- Donate the Grill source code and documentation to Apache Software
>>Foundation
>>- Build a user and developer community
>>- Support Hive and other Columnar data warehouses
>>- Support full query life cycle management
>>- Add authentication for querying cubes
>>- Provide detailed query statistics
>>
>>
>># Long Term Goals
>>
>>Here are some longer-term capabilities that would be added to Grill
>>- Add authorization for managing and querying Cubes
>>- Provide REST and CLI for full Admin controls
>>- Capability to schedule queries
>>- Query caching
>>- Integrate with Apache Spark. Creating Spark RDD from Grill query
>>- Integrate with Apache Optiq
>>
>>
>># Current Status
>>
>>The project is actively developed at InMobi. The first version is
>>deployed
>>at InMobi 4 months back. This version allows querying dimension and fact
>>data stored in Hive over CLI. The source code and documentation is hosted
>>at GitHub.
>>
>>## Meritocracy
>>
>>We intend to build a diverse developer and user community for the project
>>following the Apache meritocracy model. We want to encourage contributors
>>from multiple organizations, provide plenty of support to new developers
>>and welcome them to be committers.
>>
>>## Community
>>
>>Currently the project is being developed at InMobi. We hope to extend our
>>contributor and user base significantly in the future and build a solid
>>open source community around Grill.
>>Core Developers
>>Grill is currently being developed by Amareshwari Sriramadasu, Sharad
>>Agarwal and Jaideep Dhok from InMobi, and Sreekanth Ramakrishnan who is
>>currently employed by SoftwareAG. Raghavendra Singh from InMobi has built
>>the QA automation for Grill.
>>
>>## Alignment
>>
>>The ASF is a natural home to Grill as it is for Apache Hadoop, Apache
>>Hive,
>>Apache Spark and other emerging projects in Big Data space.
>>We believe in any enterprise, multiple data warehouses will co-exist, as
>>not all workloads are cost effective to run on single one. Apache Hive is
>>one of the crucial data warehouse along with upcoming projects like
>>Apache
>>Spark in Hadoop ecosystem. Grill will benefit in working in close
>>proximity
>>with these projects.
>>The traditional Columnar data warehouses complement Apache Hive as
>>certain
>>workloads continue to be cost effective to run in traditional columnar
>>data
>>warehouses. Having multiple data warehouses leads to data silos that
>>Grill
>>aims to cut within the enterprise and provide a holistic unified access
>>to
>>data.
>>
>>
>># Known Risks
>>
>>## Orphaned products & Reliance on Salaried Developers
>>
>>There is little risk of Grill getting orphaned, as Grill is key part of
>>the
>>Data Platform stack at InMobi. The core Grill developers plan to work on
>>it
>>full-time. We think Grill will bring value in the Big Data space and we
>>plan to grow the community of users and contributors.
>>
>>## Inexperience with Open Source
>>
>>All the core developers have long and significant experience in Apache
>>projects and Hadoop ecosystem. Amareshwari Sriramadasu has long standing
>>contributions to Apache Hadoop MapReduce and Apache Hive, she being PMC
>>member of Hadoop and a committer of Hive. Sharad Agarwal is a PMC member
>>of
>>Hadoop and contributed to Hadoop YARN and Hadoop MapReduce. Srikanth
>>Sundarrajan is a PMC member of Apache Falcon.  Sreekanth Ramakrishnan is
>>committer of Apache Hadoop.  Jaideep Dhok has contributed patches to
>>Apache
>>Hive. Gunther is a PMC member of Apache Hive. Vikram is a committer of
>>Apache Hive.
>>
>>## Homogeneous Developers
>>
>>The initial developers are employed by Hortonworks, InMobi and
>>SoftwareAG.
>>We are committed to recruiting additional committers from other companies
>>based on their contribution to the project.
>>
>>## Reliance on Salaried Developers
>>
>>The majority of initial committers are paid by their employee to
>>contribute
>>to the project and few are contributing in their spare time. Once the
>>project has a community built, we are committed to recruit committers and
>>developers from outside the current core developers.
>>
>>## Relationships with Other Apache Products
>>
>>Grill is deeply integrated with other Apache projects. Grill uses and
>>extends Apache Hive HCatalog to store and manage the Data cubes. It uses
>>HDFS and Hive session management libraries. Grill has the driver-based
>>architecture that allows for adding multiple execution drivers. Apart
>>from
>>integrating Apache Hive, it can be integrated with Apache Spark over
>>Spark
>>SQL or Shark, Apache Drill, Apache Tajo and Apache Phoenix.
>>In future we want to use Apache Optiq in Grill for query optimization and
>>cost based driver selection.
>>
>>## An Excessive Fascination with the Apache Brand
>>
>>The project is conceived from beginning to be in line with the Apache
>>philosophy. As the core developers have good experience with Apache, the
>>source code organization, build, review and commit process are highly
>>influenced by Apache. We believe that Apache will be a solid home for
>>Grill
>>to grow and build the open source community. We have also described the
>>reasons in the Rationale and Alignment sections.
>>
>>
>># Documentation
>>
>>http://inmobi.github.io/grill/
>>
>>
>># Initial Source
>>
>>The source is currently in github repository at:
>>https://github.com/inmobi/grill
>>
>>
>># Source and Intellectual Property Submission Plan
>>
>>The complete Grill code is already under Apache Software License 2.
>>
>>
>># External Dependencies
>>
>>The dependencies all have Apache compatible licenses. These include
>>Apache
>>2.0, BSD, MIT, EPL and CDDL licensed dependencies.
>>
>>
>># Cryptography
>>
>>None
>>
>>
>># Required Resources
>>
>>## Mailing lists
>>
>>grill-dev AT incubator DOT apache DOT org
>>grill-commits AT incubator DOT apache DOT org
>>grill-private AT incubator DOT apache DOT org
>>
>>## Subversion Directory
>>
>>Git is the preferred source control system: git://
>>git.apache.org/incubator-grill <http://git.apache.org/incubator-grill>
>>
>>## Issue Tracking
>>
>>JIRA Grill (GRILL)
>>
>>
>># Initial Committers
>>
>>Amareshwari Sriramadasu (amareshwari AT apache DOT org)
>>Gunther Hagleitner (gunther AT apache DOT org)
>>Jaideep Dhok (jaideep.dhok AT Inmobi DOT com)
>>Raghavendra Singh (raghavendra.singh AT Inmobi DOT com)
>>Sharad Agarwal (sharad AT apache DOT org)
>>Sreekanth Ramakrishnan (sreekanth AT apache DOT org)
>>Srikanth Sundarrajan (sriksun AT apache DOT org)
>>Suma Shivaprasad (suma.shivaprasad AT Inmobi DOT com)
>>Vikram Dixit (vikram AT apache DOT org)
>>
>>
>># Affiliations
>>
>>Amareshwari SR (InMobi)
>>Gunther Hagleitner (Hortonworks)
>>Jaideep Dhok (InMobi)
>>Raghavendra Singh (InMobi)
>>Sharad Agarwal (InMobi)
>>Sreekanth Ramakrishnan (SoftwareAG)
>>Srikanth Sundarrajan (InMobi)
>>Suma Shivaprasad (InMobi)
>>Vikram Dixit (Hortonworks)
>>
>>
>># Sponsors
>>
>>## Champion
>>
>>Vinod K <vinodkv AT apache DOT org> (Apache Member)
>>
>>## Nominated Mentors
>>
>>Chris Douglas (Microsoft)
>>Jacob Homan (Microsoft)
>>Jean Baptiste Onofre (Talend)
>>Vinod K (Hortonworks)
>>
>>## Sponsoring Entity
>>
>>Incubator PMC
>
>
>
>
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message