cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Constance Eustace (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-7062) Extension of static columns for compound cluster keys
Date Mon, 21 Apr 2014 16:29:14 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Constance Eustace updated CASSANDRA-7062:
-----------------------------------------

    Description: 
CASSANDRA-6561 implemented static columns for a given partition key.

What this is proposing for a compound cluster key is a static column that is static at intermediate
parts of a compound cluster key. This example shows a table modelling a moderately complex
EAV pattern  :

CREATE TABLE t (
   entityID text,
   propertyName text,
   valueIndex text,
   entityName text static (entityID),
   propertyType text static (entityID, propertyName),
   propertyRelations List<text> static (entityID, propertyName),
   data text,
   PRIMARY KEY (entityID, (propertyName,valueIndex))
)

So in this example has the following static columns:
- the entityName column behaves exactly as CASSANDRA-6561 details, so all cluster rows have
the same value
- the propertyType and propertyRelations columns are static with respect to the remaining
parts of the cluster key (that is, across all valueIndex values for a given propertyName),
so an update to those values for an entityID and a propertyName will be shared/constant by
all the value rows...

Is this a relatively simple extension of the same mechanism in -6561, or is this a "whoa,
you have no idea what you are proposing"?

Sample data:

Mary and Jane aren't married...

INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY MATALIN','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JOHNSON','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0001','MARY
MATALIN','kids','NOVALUE','');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0002','JANE
JOHNSON','kids','NOVALUE','');

SELECT * FROM t:

0001 MARY MATALIN  married   SingleValue   0   false
0001 MARY MATALIN  kids         NOVALUE          null
0002 JANE JOHNSON  married   SingleValue   0   false
0002 JANE JOHNSON  kids         NOVALUE          null

Then mary and jane get married (so the entityName column that is static on the partition key
is updated just like CASSANDRA-6561 )

INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY SMITH','married','SingleValue','0','TRUE');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JONES','married','SingleValue','0','TRUE');

SELECT * FROM t:

0001 MARY SMITH  married   SingleValue   0   TRUE
0001 MARY SMITH  kids         NOVALUE          null
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids         NOVALUE          null

Then mary and jane have a kid, so we add another value to the kids attribute:

INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','SingleValue','0','JIM-BOB');
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0002','kids','SingleValue','0','JENNY');

SELECT * FROM t:

0001 MARY SMITH  married   SingleValue   0   TRUE
0001 MARY SMITH  kids         SingleValue        null
0001 MARY SMITH  kids         SingleValue   0   JIM-BOB
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids         SingleValue        null
0002 JANE JONES   kids         SingleValue   0   JENNY

Then Mary has ANOTHER kid, which demonstrates the partially static column relative to the
cluster key, as ALL value rows for the property 'kids' get updated to the new value:

INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','MultiValue','1','HARRY');

SELECT * FROM t:

0001 MARY SMITH  married   SingleValue  0   TRUE
0001 MARY SMITH  kids         MultiValue          null
0001 MARY SMITH  kids         MultiValue     0   JIM-BOB
0001 MARY SMITH  kids         MultiValue     1   HARRY
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids         SingleValue        null
0002 JANE JONES   kids         SingleValue   0   JENNY


... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid hack bug in there
with the null/empty row for the kids attribute, but please bear with me on that 

Generally speaking, this will aid in flattening / denormalization of relational constructs
into cassandra-friendly schemas.

  was:
CASSANDRA-6561 implemented static columns for a given partition key.

What this is proposing for a compound cluster key is a static column that is static at intermediate
parts of a compound cluster key. This example shows a table modelling a moderately complex
EAV pattern  :

CREATE TABLE t (
   entityID text,
   propertyName text,
   valueIndex text,
   entityName text static (entityID),
   propertyType text static (entityID, propertyName),
   propertyRelations List<text> static (entityID, propertyName),
   data text,
   PRIMARY KEY (entityID, (propertyName,valueIndex))
)

So in this example has the following static columns:
- the entityName column behaves exactly as CASSANDRA-6561 details, so all cluster rows have
the same value
- the propertyName column is static with respect to the remaining parts of the cluster key
(that is, across all valueIndex values for a given propertyName)

Is this a relatively simple extension of the same mechanism in -6561, or is this a "whoa,
you have no idea what you are proposing"?

Sample data:

Mary and Jane aren't married...

INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY MATALIN','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JOHNSON','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0001','MARY
MATALIN','kids','NOVALUE','');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0002','JANE
JOHNSON','kids','NOVALUE','');

SELECT * FROM t:

0001 MARY MATALIN  married   SingleValue   0   false
0001 MARY MATALIN  kids         NOVALUE          null
0002 JANE JOHNSON  married   SingleValue   0   false
0002 JANE JOHNSON  kids         NOVALUE          null

Then mary and jane get married (so the entityName column that is static on the partition key
is updated just like CASSANDRA-6561 )

INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY SMITH','married','SingleValue','0','TRUE');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JONES','married','SingleValue','0','TRUE');

SELECT * FROM t:

0001 MARY SMITH  married   SingleValue   0   TRUE
0001 MARY SMITH  kids         NOVALUE          null
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids         NOVALUE          null

Then mary and jane have a kid, so we add another value to the kids attribute:

INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','SingleValue','0','JIM-BOB');
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0002','kids','SingleValue','0','JENNY');

SELECT * FROM t:

0001 MARY SMITH  married   SingleValue   0   TRUE
0001 MARY SMITH  kids         SingleValue        null
0001 MARY SMITH  kids         SingleValue   0   JIM-BOB
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids         SingleValue        null
0002 JANE JONES   kids         SingleValue   0   JENNY

Then Mary has ANOTHER kid, which demonstrates the partially static column relative to the
cluster key, as ALL value rows for the property 'kids' get updated to the new value:

INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','MultiValue','1','HARRY');

SELECT * FROM t:

0001 MARY SMITH  married   SingleValue  0   TRUE
0001 MARY SMITH  kids         MultiValue          null
0001 MARY SMITH  kids         MultiValue     0   JIM-BOB
0001 MARY SMITH  kids         MultiValue     1   HARRY
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids         SingleValue        null
0002 JANE JONES   kids         SingleValue   0   JENNY


... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid hack bug in there
with the null/empty row for the kids attribute, but please bear with me on that 

Generally speaking, this will aid in flattening / denormalization of relational constructs
into cassandra-friendly schemas.


> Extension of static columns for compound cluster keys
> -----------------------------------------------------
>
>                 Key: CASSANDRA-7062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7062
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Constance Eustace
>            Priority: Minor
>
> CASSANDRA-6561 implemented static columns for a given partition key.
> What this is proposing for a compound cluster key is a static column that is static at
intermediate parts of a compound cluster key. This example shows a table modelling a moderately
complex EAV pattern  :
> CREATE TABLE t (
>    entityID text,
>    propertyName text,
>    valueIndex text,
>    entityName text static (entityID),
>    propertyType text static (entityID, propertyName),
>    propertyRelations List<text> static (entityID, propertyName),
>    data text,
>    PRIMARY KEY (entityID, (propertyName,valueIndex))
> )
> So in this example has the following static columns:
> - the entityName column behaves exactly as CASSANDRA-6561 details, so all cluster rows
have the same value
> - the propertyType and propertyRelations columns are static with respect to the remaining
parts of the cluster key (that is, across all valueIndex values for a given propertyName),
so an update to those values for an entityID and a propertyName will be shared/constant by
all the value rows...
> Is this a relatively simple extension of the same mechanism in -6561, or is this a "whoa,
you have no idea what you are proposing"?
> Sample data:
> Mary and Jane aren't married...
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY MATALIN','married','SingleValue','0','false');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JOHNSON','married','SingleValue','0','false');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0001','MARY
MATALIN','kids','NOVALUE','');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0002','JANE
JOHNSON','kids','NOVALUE','');
> SELECT * FROM t:
> 0001 MARY MATALIN  married   SingleValue   0   false
> 0001 MARY MATALIN  kids         NOVALUE          null
> 0002 JANE JOHNSON  married   SingleValue   0   false
> 0002 JANE JOHNSON  kids         NOVALUE          null
> Then mary and jane get married (so the entityName column that is static on the partition
key is updated just like CASSANDRA-6561 )
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0001','MARY SMITH','married','SingleValue','0','TRUE');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES
('0002','JANE JONES','married','SingleValue','0','TRUE');
> SELECT * FROM t:
> 0001 MARY SMITH  married   SingleValue   0   TRUE
> 0001 MARY SMITH  kids         NOVALUE          null
> 0002 JANE JONES   married   SingleValue   0   TRUE
> 0002 JANE JONES   kids         NOVALUE          null
> Then mary and jane have a kid, so we add another value to the kids attribute:
> INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','SingleValue','0','JIM-BOB');
> INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0002','kids','SingleValue','0','JENNY');
> SELECT * FROM t:
> 0001 MARY SMITH  married   SingleValue   0   TRUE
> 0001 MARY SMITH  kids         SingleValue        null
> 0001 MARY SMITH  kids         SingleValue   0   JIM-BOB
> 0002 JANE JONES   married   SingleValue   0   TRUE
> 0002 JANE JONES   kids         SingleValue        null
> 0002 JANE JONES   kids         SingleValue   0   JENNY
> Then Mary has ANOTHER kid, which demonstrates the partially static column relative to
the cluster key, as ALL value rows for the property 'kids' get updated to the new value:
> INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','MultiValue','1','HARRY');
> SELECT * FROM t:
> 0001 MARY SMITH  married   SingleValue  0   TRUE
> 0001 MARY SMITH  kids         MultiValue          null
> 0001 MARY SMITH  kids         MultiValue     0   JIM-BOB
> 0001 MARY SMITH  kids         MultiValue     1   HARRY
> 0002 JANE JONES   married   SingleValue   0   TRUE
> 0002 JANE JONES   kids         SingleValue        null
> 0002 JANE JONES   kids         SingleValue   0   JENNY
> ... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid hack bug
in there with the null/empty row for the kids attribute, but please bear with me on that 
> Generally speaking, this will aid in flattening / denormalization of relational constructs
into cassandra-friendly schemas.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message