Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A49311DDD for ; Thu, 24 Apr 2014 20:49:31 +0000 (UTC) Received: (qmail 34079 invoked by uid 500); 24 Apr 2014 20:49:26 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 33934 invoked by uid 500); 24 Apr 2014 20:49:24 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 33698 invoked by uid 99); 24 Apr 2014 20:49:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Apr 2014 20:49:21 +0000 Date: Thu, 24 Apr 2014 20:49:21 +0000 (UTC) From: "Constance Eustace (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-7062) Extension of static columns for compound cluster keys MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Constance Eustace updated CASSANDRA-7062: ----------------------------------------- Description: CASSANDRA-6561 implemented static columns for a given partition key. What this is proposing for a compound cluster key is a static column that is static at intermediate parts of a compound cluster key. This example shows a table modelling a moderately complex EAV pattern : {code} CREATE TABLE t ( entityID text, propertyName text, valueIndex text, entityName text static (entityID), propertyType text static (entityID, propertyName), propertyRelations List static (entityID, propertyName), data text, PRIMARY KEY (entityID, (propertyName,valueIndex)) ) {code} So in this example has the following static columns: - the entityName column behaves exactly as CASSANDRA-6561 details, so all cluster rows have the same value - the propertyType and propertyRelations columns are static with respect to the remaining parts of the cluster key (that is, across all valueIndex values for a given propertyName), so an update to those values for an entityID and a propertyName will be shared/constant by all the value rows... Is this a relatively simple extension of the same mechanism in -6561, or is this a "whoa, you have no idea what you are proposing"? Sample data: Mary and Jane aren't married... {code} INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0001','MARY MATALIN','married','SingleValue','0','false'); INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0002','JANE JOHNSON','married','SingleValue','0','false'); INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0001','MARY MATALIN','kids','NOVALUE',''); INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0002','JANE JOHNSON','kids','NOVALUE',''); {code} {code} SELECT * FROM t: 0001 MARY MATALIN married SingleValue 0 false 0001 MARY MATALIN kids NOVALUE null 0002 JANE JOHNSON married SingleValue 0 false 0002 JANE JOHNSON kids NOVALUE null {code} Then mary and jane get married (so the entityName column that is static on the partition key is updated just like CASSANDRA-6561 ) {code} INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0001','MARY SMITH','married','SingleValue','0','TRUE'); INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0002','JANE JONES','married','SingleValue','0','TRUE'); {code} {code} SELECT * FROM t: 0001 MARY SMITH married SingleValue 0 TRUE 0001 MARY SMITH kids NOVALUE null 0002 JANE JONES married SingleValue 0 TRUE 0002 JANE JONES kids NOVALUE null {code} Then mary and jane have a kid, so we add another value to the kids attribute: {code} INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','SingleValue','0','JIM-BOB'); INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0002','kids','SingleValue','0','JENNY'); {code} {code} SELECT * FROM t: 0001 MARY SMITH married SingleValue 0 TRUE 0001 MARY SMITH kids SingleValue null 0001 MARY SMITH kids SingleValue 0 JIM-BOB 0002 JANE JONES married SingleValue 0 TRUE 0002 JANE JONES kids SingleValue null 0002 JANE JONES kids SingleValue 0 JENNY {code} Then Mary has ANOTHER kid, which demonstrates the partially static column relative to the cluster key, as ALL value rows for the property 'kids' get updated to the new value: {code} INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','MultiValue','1','HARRY'); {code} {code} SELECT * FROM t: 0001 MARY SMITH married SingleValue 0 TRUE 0001 MARY SMITH kids MultiValue null 0001 MARY SMITH kids MultiValue 0 JIM-BOB 0001 MARY SMITH kids MultiValue 1 HARRY 0002 JANE JONES married SingleValue 0 TRUE 0002 JANE JONES kids SingleValue null 0002 JANE JONES kids SingleValue 0 JENNY {code} ... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid hack bug in there with the null/empty row for the kids attribute, but please bear with me on that Generally speaking, this will aid in flattening / denormalization of relational constructs into cassandra-friendly schemas. In the above example we are flattening a relational schema of three tables: entity, property, and value tables into a single sparse flattened denormalized compound table. was: CASSANDRA-6561 implemented static columns for a given partition key. What this is proposing for a compound cluster key is a static column that is static at intermediate parts of a compound cluster key. This example shows a table modelling a moderately complex EAV pattern : {code} CREATE TABLE t ( entityID text, propertyName text, valueIndex text, entityName text static (entityID), propertyType text static (entityID, propertyName), propertyRelations List static (entityID, propertyName), data text, PRIMARY KEY (entityID, (propertyName,valueIndex)) ) {code} So in this example has the following static columns: - the entityName column behaves exactly as CASSANDRA-6561 details, so all cluster rows have the same value - the propertyType and propertyRelations columns are static with respect to the remaining parts of the cluster key (that is, across all valueIndex values for a given propertyName), so an update to those values for an entityID and a propertyName will be shared/constant by all the value rows... Is this a relatively simple extension of the same mechanism in -6561, or is this a "whoa, you have no idea what you are proposing"? Sample data: Mary and Jane aren't married... {code} INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0001','MARY MATALIN','married','SingleValue','0','false'); INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0002','JANE JOHNSON','married','SingleValue','0','false'); INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0001','MARY MATALIN','kids','NOVALUE',''); INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0002','JANE JOHNSON','kids','NOVALUE',''); {code} {code} SELECT * FROM t: 0001 MARY MATALIN married SingleValue 0 false 0001 MARY MATALIN kids NOVALUE null 0002 JANE JOHNSON married SingleValue 0 false 0002 JANE JOHNSON kids NOVALUE null {code} Then mary and jane get married (so the entityName column that is static on the partition key is updated just like CASSANDRA-6561 ) {code} INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0001','MARY SMITH','married','SingleValue','0','TRUE'); INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0002','JANE JONES','married','SingleValue','0','TRUE'); {code} {code} SELECT * FROM t: 0001 MARY SMITH married SingleValue 0 TRUE 0001 MARY SMITH kids NOVALUE null 0002 JANE JONES married SingleValue 0 TRUE 0002 JANE JONES kids NOVALUE null {code} Then mary and jane have a kid, so we add another value to the kids attribute: {code} INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','SingleValue','0','JIM-BOB'); INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0002','kids','SingleValue','0','JENNY'); {code} {code} SELECT * FROM t: 0001 MARY SMITH married SingleValue 0 TRUE 0001 MARY SMITH kids SingleValue null 0001 MARY SMITH kids SingleValue 0 JIM-BOB 0002 JANE JONES married SingleValue 0 TRUE 0002 JANE JONES kids SingleValue null 0002 JANE JONES kids SingleValue 0 JENNY {code} Then Mary has ANOTHER kid, which demonstrates the partially static column relative to the cluster key, as ALL value rows for the property 'kids' get updated to the new value: {code} INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','MultiValue','1','HARRY'); {code} {code} SELECT * FROM t: 0001 MARY SMITH married SingleValue 0 TRUE 0001 MARY SMITH kids MultiValue null 0001 MARY SMITH kids MultiValue 0 JIM-BOB 0001 MARY SMITH kids MultiValue 1 HARRY 0002 JANE JONES married SingleValue 0 TRUE 0002 JANE JONES kids SingleValue null 0002 JANE JONES kids SingleValue 0 JENNY {code} ... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid hack bug in there with the null/empty row for the kids attribute, but please bear with me on that Generally speaking, this will aid in flattening / denormalization of relational constructs into cassandra-friendly schemas. In the above example we are flattening a relational schema of three tables: entity, property, and value tables into a single sparse flattened denormalized compound table. > Extension of static columns for compound cluster keys > ----------------------------------------------------- > > Key: CASSANDRA-7062 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7062 > Project: Cassandra > Issue Type: New Feature > Reporter: Constance Eustace > Priority: Minor > > CASSANDRA-6561 implemented static columns for a given partition key. > What this is proposing for a compound cluster key is a static column that is static at intermediate parts of a compound cluster key. This example shows a table modelling a moderately complex EAV pattern : > {code} > CREATE TABLE t ( > entityID text, > propertyName text, > valueIndex text, > entityName text static (entityID), > propertyType text static (entityID, propertyName), > propertyRelations List static (entityID, propertyName), > data text, > PRIMARY KEY (entityID, (propertyName,valueIndex)) > ) > {code} > So in this example has the following static columns: > - the entityName column behaves exactly as CASSANDRA-6561 details, so all cluster rows have the same value > - the propertyType and propertyRelations columns are static with respect to the remaining parts of the cluster key (that is, across all valueIndex values for a given propertyName), so an update to those values for an entityID and a propertyName will be shared/constant by all the value rows... > Is this a relatively simple extension of the same mechanism in -6561, or is this a "whoa, you have no idea what you are proposing"? > Sample data: > Mary and Jane aren't married... > {code} > INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0001','MARY MATALIN','married','SingleValue','0','false'); > INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0002','JANE JOHNSON','married','SingleValue','0','false'); > INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0001','MARY MATALIN','kids','NOVALUE',''); > INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) VALUES ('0002','JANE JOHNSON','kids','NOVALUE',''); > {code} > {code} > SELECT * FROM t: > 0001 MARY MATALIN married SingleValue 0 false > 0001 MARY MATALIN kids NOVALUE null > 0002 JANE JOHNSON married SingleValue 0 false > 0002 JANE JOHNSON kids NOVALUE null > {code} > Then mary and jane get married (so the entityName column that is static on the partition key is updated just like CASSANDRA-6561 ) > {code} > INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0001','MARY SMITH','married','SingleValue','0','TRUE'); > INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, data) VALUES ('0002','JANE JONES','married','SingleValue','0','TRUE'); > {code} > {code} > SELECT * FROM t: > 0001 MARY SMITH married SingleValue 0 TRUE > 0001 MARY SMITH kids NOVALUE null > 0002 JANE JONES married SingleValue 0 TRUE > 0002 JANE JONES kids NOVALUE null > {code} > Then mary and jane have a kid, so we add another value to the kids attribute: > {code} > INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','SingleValue','0','JIM-BOB'); > INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0002','kids','SingleValue','0','JENNY'); > {code} > {code} > SELECT * FROM t: > 0001 MARY SMITH married SingleValue 0 TRUE > 0001 MARY SMITH kids SingleValue null > 0001 MARY SMITH kids SingleValue 0 JIM-BOB > 0002 JANE JONES married SingleValue 0 TRUE > 0002 JANE JONES kids SingleValue null > 0002 JANE JONES kids SingleValue 0 JENNY > {code} > Then Mary has ANOTHER kid, which demonstrates the partially static column relative to the cluster key, as ALL value rows for the property 'kids' get updated to the new value: > {code} > INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES ('0001','kids','MultiValue','1','HARRY'); > {code} > {code} > SELECT * FROM t: > 0001 MARY SMITH married SingleValue 0 TRUE > 0001 MARY SMITH kids MultiValue null > 0001 MARY SMITH kids MultiValue 0 JIM-BOB > 0001 MARY SMITH kids MultiValue 1 HARRY > 0002 JANE JONES married SingleValue 0 TRUE > 0002 JANE JONES kids SingleValue null > 0002 JANE JONES kids SingleValue 0 JENNY > {code} > ... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid hack bug in there with the null/empty row for the kids attribute, but please bear with me on that > Generally speaking, this will aid in flattening / denormalization of relational constructs into cassandra-friendly schemas. In the above example we are flattening a relational schema of three tables: entity, property, and value tables into a single sparse flattened denormalized compound table. -- This message was sent by Atlassian JIRA (v6.2#6252)