cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ZhaoYang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-11194) materialized views - support explode() on collections
Date Sun, 23 Jul 2017 15:20:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ZhaoYang updated CASSANDRA-11194:
---------------------------------
    Component/s: Materialized Views

> materialized views - support explode() on collections
> -----------------------------------------------------
>
>                 Key: CASSANDRA-11194
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11194
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Materialized Views
>            Reporter: Jon Haddad
>
> I'm working on a database design to model a product catalog.  Products can belong to
categories.  Categories can belong to multiple sub categories (think about Amazon's complex
taxonomies).
> My category table would look like this, giving me individual categories & their parents:
> {code}
> CREATE TABLE category (
>     category_id uuid primary key,
>     name text,
>     parents set<uuid>
> );
> {code}
> To get a list of all the children of a particular category, I need a table that looks
like the following:
> {code}
> CREATE TABLE categories_by_parent (
>     parent_id uuid,
>     category_id uuid,
>     name text,
>     primary key (parent_id, category_id)
> );
> {code}
> The important thing to note here is that a single category can have multiple parents.
> I'd like to propose support for collections in materialized views via an explode() function
that would create 1 row per item in the collection.  For instance, I'll insert the following
3 rows (2 parents, 1 child) into the category table:
> {code}
> insert into category (category_id, name, parents) values (009fe0e1-5b09-4efc-a92d-c03720324a4f,
'Parent', null);
> insert into category (category_id, name, parents) values (1f2914de-0adf-4afc-b7ad-ddd8dc876ab1,
'Parent2', null);
> insert into category (category_id, name, parents) values (1f93bc07-9874-42a5-a7d1-b741dc9c509c,
'Child', {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 });
> cqlsh:test> select * from category;
>  category_id                          | name    | parents
> --------------------------------------+---------+------------------------------------------------------------------------------
>  009fe0e1-5b09-4efc-a92d-c03720324a4f |  Parent |                                   
                                     null
>  1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 | Parent2 |                                   
                                     null
>  1f93bc07-9874-42a5-a7d1-b741dc9c509c |   Child | {009fe0e1-5b09-4efc-a92d-c03720324a4f,
1f2914de-0adf-4afc-b7ad-ddd8dc876ab1}
> (3 rows)
> {code}
> Given the following CQL to select the child category, utilizing an explode function,
I would expect to get back 2 rows, 1 for each parent:
> {code}
> select category_id, name, explode(parents) as parent_id from category where category_id
= 1f93bc07-9874-42a5-a7d1-b741dc9c509c;
> category_id                          | name  | parent_id
> --------------------------------------+-------+--------------------------------------
> 1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 009fe0e1-5b09-4efc-a92d-c03720324a4f
> 1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1
> (2 rows)
> {code}
> This functionality would ideally apply to materialized views, since the ability to control
partitioning here would allow us to efficiently query our MV for all categories belonging
to a parent in a complex taxonomy.
> {code}
> CREATE MATERIALIZED VIEW categories_by_parent as
> SELECT explode(parents) as parent_id,
>         category_id, name FROM category WHERE parents IS NOT NULL
> {code}
> The explode() function is available in Spark Dataframes and my proposed function has
the same behavior: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.explode



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message