cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Haddad (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-11194) materialized views - support explode() on collections
Date Fri, 19 Feb 2016 19:28:18 GMT
Jon Haddad created CASSANDRA-11194:

             Summary: materialized views - support explode() on collections
                 Key: CASSANDRA-11194
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Jon Haddad

I'm working on a database design to model a product catalog.  Products can belong to categories.
 Categories can belong to multiple sub categories (think about Amazon's complex taxonomies).

My category table would look like this, giving me individual categories & their parents:

CREATE TABLE category (
    category_id uuid primary key,
    name text,
    parents set<uuid>

To get a list of all the children of a particular category, I need a table that looks like
the following:

CREATE TABLE categories_by_parent (
    parent_id uuid,
    category_id uuid,
    name text,
    primary key (parent_id, category_id)

The important thing to note here is that a single category can have multiple parents.

I'd like to propose support for collections in materialized views via an explode() function
that would create 1 row per item in the collection.  For instance, I'll insert the following
3 rows (2 parents, 1 child) into the category table:

insert into category (category_id, name, parents) values (009fe0e1-5b09-4efc-a92d-c03720324a4f,
'Parent', null);

insert into category (category_id, name, parents) values (1f2914de-0adf-4afc-b7ad-ddd8dc876ab1,
'Parent2', null);

insert into category (category_id, name, parents) values (1f93bc07-9874-42a5-a7d1-b741dc9c509c,
'Child', {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 });

cqlsh:test> select * from category;

 category_id                          | name    | parents
 009fe0e1-5b09-4efc-a92d-c03720324a4f |  Parent |                                        
 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 | Parent2 |                                        
 1f93bc07-9874-42a5-a7d1-b741dc9c509c |   Child | {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1}

(3 rows)


Given the following CQL to select the child category, utilizing an explode function, I would
expect to get back 2 rows, 1 for each parent:

select category_id, name, explode(parents) as parent_id from category where category_id =

category_id                          | name  | parent_id
1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 009fe0e1-5b09-4efc-a92d-c03720324a4f
1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1

(2 rows)

This functionality would ideally apply to materialized views, since the ability to control
partitioning here would allow us to efficiently query our MV for all categories belonging
to a parent in a complex taxonomy.

CREATE MATERIALIZED VIEW categories_by_parent as
SELECT explode(parents) as parent_id,
        category_id, name FROM category WHERE parents IS NOT NULL

The explode() function is available in Spark Dataframes and my proposed function has the same

This message was sent by Atlassian JIRA

View raw message