cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Constance Eustace (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-7062) Extension of static columns for compound cluster keys
Date Thu, 24 Apr 2014 20:49:21 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Constance Eustace updated CASSANDRA-7062:
-----------------------------------------

    Description: 
CASSANDRA-6561 implemented static columns for a given partition key.

What this is proposing for a compound cluster key is a static column that is static at intermediate
parts of a compound cluster key. This example shows a table modelling a moderately complex
EAV pattern  :

{code}
CREATE TABLE t (
   entityID text,
   propertyName text,
   valueIndex text,
   entityName text static (entityID),
   propertyType text static (entityID, propertyName),
   propertyRelations List<text> static (entityID, propertyName),
   data text,
   PRIMARY KEY (entityID, (propertyName,valueIndex))
)
{code}
So in this example has the following static columns:
- the entityName column behaves exactly as CASSANDRA-6561 details, so all cluster rows have
the same value
- the propertyType and propertyRelations columns are static with respect to the remaining
parts of the cluster key (that is, across all valueIndex values for a given propertyName),
so an update to those values for an entityID and a propertyName will be shared/constant by
all the value rows...

Is this a relatively simple extension of the same mechanism in -6561, or is this a "whoa,
you have no idea what you are proposing"?

Sample data:

Mary and Jane aren't married...
{code}
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY MATALIN','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JOHNSON','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0001','MARY
MATALIN','kids','NOVALUE','');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0002','JANE
JOHNSON','kids','NOVALUE','');
{code}
{code}
SELECT * FROM t:

0001 MARY MATALIN  married   SingleValue   0   false
0001 MARY MATALIN  kids         NOVALUE          null
0002 JANE JOHNSON  married   SingleValue   0   false
0002 JANE JOHNSON  kids         NOVALUE          null
{code}
Then mary and jane get married (so the entityName column that is static on the partition key
is updated just like CASSANDRA-6561 )
{code}
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY SMITH','married','SingleValue','0','TRUE');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JONES','married','SingleValue','0','TRUE');
{code}
{code}
SELECT * FROM t:

0001 MARY SMITH  married   SingleValue   0   TRUE
0001 MARY SMITH  kids         NOVALUE          null
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids         NOVALUE          null
{code}
Then mary and jane have a kid, so we add another value to the kids attribute:
{code}
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','SingleValue','0','JIM-BOB');
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0002','kids','SingleValue','0','JENNY');
{code}
{code}
SELECT * FROM t:

0001 MARY SMITH  married   SingleValue   0   TRUE
0001 MARY SMITH  kids         SingleValue        null
0001 MARY SMITH  kids         SingleValue   0   JIM-BOB
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids         SingleValue        null
0002 JANE JONES   kids         SingleValue   0   JENNY
{code}
Then Mary has ANOTHER kid, which demonstrates the partially static column relative to the
cluster key, as ALL value rows for the property 'kids' get updated to the new value:
{code}
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','MultiValue','1','HARRY');
{code}
{code}
SELECT * FROM t:

0001 MARY SMITH  married   SingleValue  0   TRUE
0001 MARY SMITH  kids      MultiValue          null
0001 MARY SMITH  kids      MultiValue     0   JIM-BOB
0001 MARY SMITH  kids      MultiValue     1   HARRY
0002 JANE JONES  married   SingleValue   0   TRUE
0002 JANE JONES  kids      SingleValue        null
0002 JANE JONES  kids      SingleValue   0   JENNY
{code}

... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid hack bug in there
with the null/empty row for the kids attribute, but please bear with me on that 

Generally speaking, this will aid in flattening / denormalization of relational constructs
into cassandra-friendly schemas. In the above example we are flattening a relational schema
of three tables: entity, property, and value tables into a single sparse flattened denormalized
compound table.


  was:
CASSANDRA-6561 implemented static columns for a given partition key.

What this is proposing for a compound cluster key is a static column that is static at intermediate
parts of a compound cluster key. This example shows a table modelling a moderately complex
EAV pattern  :

{code}
CREATE TABLE t (
   entityID text,
   propertyName text,
   valueIndex text,
   entityName text static (entityID),
   propertyType text static (entityID, propertyName),
   propertyRelations List<text> static (entityID, propertyName),
   data text,
   PRIMARY KEY (entityID, (propertyName,valueIndex))
)
{code}
So in this example has the following static columns:
- the entityName column behaves exactly as CASSANDRA-6561 details, so all cluster rows have
the same value
- the propertyType and propertyRelations columns are static with respect to the remaining
parts of the cluster key (that is, across all valueIndex values for a given propertyName),
so an update to those values for an entityID and a propertyName will be shared/constant by
all the value rows...

Is this a relatively simple extension of the same mechanism in -6561, or is this a "whoa,
you have no idea what you are proposing"?

Sample data:

Mary and Jane aren't married...
{code}
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY MATALIN','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JOHNSON','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0001','MARY
MATALIN','kids','NOVALUE','');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0002','JANE
JOHNSON','kids','NOVALUE','');
{code}
{code}
SELECT * FROM t:

0001 MARY MATALIN  married   SingleValue   0   false
0001 MARY MATALIN  kids         NOVALUE          null
0002 JANE JOHNSON  married   SingleValue   0   false
0002 JANE JOHNSON  kids         NOVALUE          null
{code}
Then mary and jane get married (so the entityName column that is static on the partition key
is updated just like CASSANDRA-6561 )
{code}
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY SMITH','married','SingleValue','0','TRUE');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JONES','married','SingleValue','0','TRUE');
{code}
{code}
SELECT * FROM t:

0001 MARY SMITH  married   SingleValue   0   TRUE
0001 MARY SMITH  kids         NOVALUE          null
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids         NOVALUE          null
{code}
Then mary and jane have a kid, so we add another value to the kids attribute:
{code}
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','SingleValue','0','JIM-BOB');
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0002','kids','SingleValue','0','JENNY');
{code}
{code}
SELECT * FROM t:

0001 MARY SMITH  married   SingleValue   0   TRUE
0001 MARY SMITH  kids         SingleValue        null
0001 MARY SMITH  kids         SingleValue   0   JIM-BOB
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids         SingleValue        null
0002 JANE JONES   kids         SingleValue   0   JENNY
{code}
Then Mary has ANOTHER kid, which demonstrates the partially static column relative to the
cluster key, as ALL value rows for the property 'kids' get updated to the new value:
{code}
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','MultiValue','1','HARRY');
{code}
{code}
SELECT * FROM t:

0001 MARY SMITH  married   SingleValue  0   TRUE
0001 MARY SMITH  kids         MultiValue          null
0001 MARY SMITH  kids         MultiValue     0   JIM-BOB
0001 MARY SMITH  kids         MultiValue     1   HARRY
0002 JANE JONES  married   SingleValue   0   TRUE
0002 JANE JONES  kids         SingleValue        null
0002 JANE JONES  kids         SingleValue   0   JENNY
{code}

... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid hack bug in there
with the null/empty row for the kids attribute, but please bear with me on that 

Generally speaking, this will aid in flattening / denormalization of relational constructs
into cassandra-friendly schemas. In the above example we are flattening a relational schema
of three tables: entity, property, and value tables into a single sparse flattened denormalized
compound table.



> Extension of static columns for compound cluster keys
> -----------------------------------------------------
>
>                 Key: CASSANDRA-7062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7062
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Constance Eustace
>            Priority: Minor
>
> CASSANDRA-6561 implemented static columns for a given partition key.
> What this is proposing for a compound cluster key is a static column that is static at
intermediate parts of a compound cluster key. This example shows a table modelling a moderately
complex EAV pattern  :
> {code}
> CREATE TABLE t (
>    entityID text,
>    propertyName text,
>    valueIndex text,
>    entityName text static (entityID),
>    propertyType text static (entityID, propertyName),
>    propertyRelations List<text> static (entityID, propertyName),
>    data text,
>    PRIMARY KEY (entityID, (propertyName,valueIndex))
> )
> {code}
> So in this example has the following static columns:
> - the entityName column behaves exactly as CASSANDRA-6561 details, so all cluster rows
have the same value
> - the propertyType and propertyRelations columns are static with respect to the remaining
parts of the cluster key (that is, across all valueIndex values for a given propertyName),
so an update to those values for an entityID and a propertyName will be shared/constant by
all the value rows...
> Is this a relatively simple extension of the same mechanism in -6561, or is this a "whoa,
you have no idea what you are proposing"?
> Sample data:
> Mary and Jane aren't married...
> {code}
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY MATALIN','married','SingleValue','0','false');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JOHNSON','married','SingleValue','0','false');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0001','MARY
MATALIN','kids','NOVALUE','');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0002','JANE
JOHNSON','kids','NOVALUE','');
> {code}
> {code}
> SELECT * FROM t:
> 0001 MARY MATALIN  married   SingleValue   0   false
> 0001 MARY MATALIN  kids         NOVALUE          null
> 0002 JANE JOHNSON  married   SingleValue   0   false
> 0002 JANE JOHNSON  kids         NOVALUE          null
> {code}
> Then mary and jane get married (so the entityName column that is static on the partition
key is updated just like CASSANDRA-6561 )
> {code}
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY SMITH','married','SingleValue','0','TRUE');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JONES','married','SingleValue','0','TRUE');
> {code}
> {code}
> SELECT * FROM t:
> 0001 MARY SMITH  married   SingleValue   0   TRUE
> 0001 MARY SMITH  kids         NOVALUE          null
> 0002 JANE JONES   married   SingleValue   0   TRUE
> 0002 JANE JONES   kids         NOVALUE          null
> {code}
> Then mary and jane have a kid, so we add another value to the kids attribute:
> {code}
> INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','SingleValue','0','JIM-BOB');
> INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0002','kids','SingleValue','0','JENNY');
> {code}
> {code}
> SELECT * FROM t:
> 0001 MARY SMITH  married   SingleValue   0   TRUE
> 0001 MARY SMITH  kids         SingleValue        null
> 0001 MARY SMITH  kids         SingleValue   0   JIM-BOB
> 0002 JANE JONES   married   SingleValue   0   TRUE
> 0002 JANE JONES   kids         SingleValue        null
> 0002 JANE JONES   kids         SingleValue   0   JENNY
> {code}
> Then Mary has ANOTHER kid, which demonstrates the partially static column relative to
the cluster key, as ALL value rows for the property 'kids' get updated to the new value:
> {code}
> INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','MultiValue','1','HARRY');
> {code}
> {code}
> SELECT * FROM t:
> 0001 MARY SMITH  married   SingleValue  0   TRUE
> 0001 MARY SMITH  kids      MultiValue          null
> 0001 MARY SMITH  kids      MultiValue     0   JIM-BOB
> 0001 MARY SMITH  kids      MultiValue     1   HARRY
> 0002 JANE JONES  married   SingleValue   0   TRUE
> 0002 JANE JONES  kids      SingleValue        null
> 0002 JANE JONES  kids      SingleValue   0   JENNY
> {code}
> ... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid hack bug
in there with the null/empty row for the kids attribute, but please bear with me on that 
> Generally speaking, this will aid in flattening / denormalization of relational constructs
into cassandra-friendly schemas. In the above example we are flattening a relational schema
of three tables: entity, property, and value tables into a single sparse flattened denormalized
compound table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message