cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "ThomasBoose/EERD model components to Cassandra Column family's" by ThomasBoose
Date Sun, 12 Dec 2010 17:16:52 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "ThomasBoose/EERD model components to Cassandra Column family's" page has been changed
by ThomasBoose.
http://wiki.apache.org/cassandra/ThomasBoose/EERD%20model%20components%20to%20Cassandra%20Column%20family%27s?action=diff&rev1=15&rev2=16

--------------------------------------------------

  
  ''I'm not sure to what detaillevel security rules can apply in a Cassandra database. At
least I know that one can create logins per cluster.''
  
- If it is necessary to use different keys for both collections, sometimes it is not up to
one designer to select both keys, although the number of element are equal and they are related
one on one, in a relational model the designer gets to select either key to insert into the
other collection with an unique and foreign key constraint.
+ If it is necessary to use different keys for both collections, sometimes it is not up to
one designer to select both keys, although the number of elements are equal and they are related
one on one, in a relational model the designer gets to select either key to insert into the
other collection with a unique and foreign key constraint.
  
  ''' {{http://boose.nl/images/oneononeequal.jpeg}} '''
  
- In Cassandra modeling you are forced to either croslink both keys, So you'd design both
keys foreign in both ColumnFamily's. Or you create a third ColumnFamily in which you store
both keys preceded by a token to which columfamily you are refering. Lets focus on the first
option. Say we hand out phones to our employees and we agree that every employee will always
have one phone. and phones that are not used are not stored in our database. The phone has
a phonenumber as key where the employee has a social security number. In order to know which
number to dial when looking for employee X and who is calling giving a specific phonenumber
we need to store both keys foreign in both ColumnFamily's.
+ In Cassandra modelling you are forced to either crosslink both keys, so you'd design both
keys foreign in both ColumnFamily's. Or you create a third ColumnFamily in which you store
both keys preceded by a token to which ColumnFamily you are referring to. Let's focus on the
first option. Say we hand out phones to our employees and we agree that every employee will
always have one phone. And phones that are not used are not stored in our database. The phone
has a phonenumber as key where the employee has a social security number. In order to know
which number to dial when looking for employee X and who is calling giving a specific phonenumber,
we need to store both keys foreign in both ColumnFamily's.
  ||||||||<style="text-align: center;">CF_Employee ''' ''' ||
  ||<style="text-align: center;" |2>123-12-1234 ||name ||phone ||salary ||
  ||John ||0555-123456 ||10.000 ||
@@ -61, +61 @@

  
  
  
- Using a static columnname and requiring input in the foreign key fields, checking the existence
of the key in the other ColumnFamily and processing updates and deletes are all subject to
programming in the DBMS layer. Cassandra itself does not, and probably will not, provide foreign
key logic.  One could imagine an process that makes sure the cross references stay consistend:
+ Using a static columnname and requiring input in the foreign key fields, checking the existence
of the key in the other ColumnFamily and processing updates and deletes are all subject to
programming in the DBMS layer. Cassandra itself does not, and probably will not, provide foreign
key logic.  One could imagine a process that makes sure the cross references stay consistent:
  
  {{{
  cf_Employee.insert('321-21-4321', {'name':'Jane', 'phone':'0555-654321'})
@@ -72, +72 @@

        raise error or delete specified employee
  }}}
  ==== Subset elements ====
- One on one relationships with one collection being smaller, in fact being a subset of the
other collections in relational systems are solved by adding the key of the larger collection
as foreign key to the smaler one. Preferably one uses the same key values as decribed above.
This way we prevent null values that are not strictly indicating an unknown value. Null value's
should only meen "We know there is a value but the value is unknown" as we've all learned
in school.
+ One on one relationships with one collection being smaller, in fact being a subset of the
other collections in relational systems are solved by adding the key of the larger collection
as foreign key to the smaller one. Preferably one uses the same key values as decribed above.
This way we prevent null values that are not strictly indicating an unknown value. Null value's
should only mean "We know there is a value but the value is unknown" as we've all learned
in school.
  
  {{http://boose.nl/images/specialisation.jpeg}}
  
- As stated we prefer the foreign key to be the same value as the key from the superset ColumnFamily.
In every other case we'll have to introduce logic to keep the relation cosistent. In any case
you have to enforce the existance of all keys in the subset in the superset. Logic must also
be provided when deleting elements from the superset with respect to the related element in
the subset.These kind of relationships are also found in specialisations. The given example
can be viewed as a single non total specialisation.
+ As stated we prefer the foreign key to be the same value as the key from the superset ColumnFamily.
In every other case we'll have to introduce logic to keep the relation consistent. In any
case you have to enforce the existance of all keys in the subset in the superset. Logic must
also be provided when deleting elements from the superset with respect to the related element
in the subset. These kind of relationships are also found in specialisations. The given example
can be viewed as a single non total specialisation.
  
- In order to create a disjunct specialisation one should add an column to the employee ColumnFamily
containing a reference to a single subset ColumnFamily. Logic has to be introduced to keep
your data consitent I would again suggest to implement this logic in a DBMS tier.
+ In order to create a disjunct specialisation one should add a column to the employee ColumnFamily
containing a reference to a single subset ColumnFamily. Logic has to be introduced to keep
your data consitent. I would again suggest to implement this logic in a DBMS tier.
  
  ==== Overlap ====
- The easiest one on one relation to implement is the one in which elements in both collections
do not need to be in the other but might. If at all possible create one big ColumnFamily that
collects all elements from both collections and specialise to your intended ColumnFamily's,
even if there is no corresponding attribute (column). If absolutly neccessary you can provide
keys from either ColumnFamily if the values are not the same but one on one related. See above
for contraint considerations.
+ The easiest one on one relation to implement is the one in which elements in both collections
do not need to be in the other but might. If at all possible create one big ColumnFamily that
collects all elements from both collections and specialise to your intended ColumnFamily's,
even if there is no corresponding attribute (column). If absolutly neccessary you can provide
keys from either ColumnFamily if the values are not the same but one on one related. See above
for constraint considerations.
  
  === 1 to Many ===
- In one to many relationships we add the key from the "one" side foreign to the "many" side.
So if we're moddeling students studing at only one school-unit at a time we would add the
unit's key to the student as foreign. Considering that no foreign key logic is provided you
will have to write your own code to enforce consistancy in unit's existing, when the unit
attribute of a student is set, and defining behaviour when deleting a unit. Cosiddering the
fact that this kind of relation is very common one could best create the logic for this at
a seperate DBMS tier.
+ In one to many relationships we add the key from the "one" side foreign to the "many" side.
So if we're modelling students studying at only one school-unit at a time, we would add the
unit's key to the student as foreign. Considering that no foreign key logic is provided you
will have to write your own code to enforce '''''__consistancy in units existing__''''', when
the unit attribute of a student is set, and defining behaviour when deleting a unit. Considering
the fact that this kind of relation is very common one could best create the logic for this
at a seperate DBMS tier.
  
- Every student has only one school-unit so we enforce one static name of a column that will
reference this unit. for instance this column in the cf_Student ColumnFamily is called "school-unit".
In a Cassandra database this is not sufficient to retrieve all student within this unit. One
could find answers to questions like these but it would require quite a lot of processing
power. If a ColumnFamily, the cf_School_unit family in this case, has only one of these relations,
then one could chose to add all student keys to that ColumnFamily it self. I would not count
on this situation persisting in future releases of you system and therefore sugest that you'de
provide seperate ColumnFamily's for each one to many relationship that you model.
+ Every student has only one school-unit so we enforce one static name of a column that will
reference this unit. For instance this column in the cf_Student ColumnFamily is called "school-unit".
In a Cassandra database this is not sufficient to retrieve all student within this unit. One
could find answers to questions like these but it would require quite a lot of processing
power. If a ColumnFamily, the cf_School_unit family in this case, has only one of these relations,
then one could choose to add all student keys to that ColumnFamily itself. I would not count
on this situation persisting in future releases of your system and therefore suggest that
you'd provide separate ColumnFamily's for each one to many relationship that you model.
  
- This would leed to three ColumnFamily's
+ This would lead to three ColumnFamily's
  ||||||||<tablewidth="400px"style="text-align: center;">CF_Student ''' ''' ||
  ||<style="text-align: center;" |2>123-12-1234 ||name ||unit ||city ||
  ||John ||SE ||The Hague ||
@@ -108, +108 @@

  
  
  
- No value's are actualy stored in the columns indicating de studentnumbers. These columns
only exist to indicate which students are present in this unit.
+ No values are actually stored in the columns indicating the studentnumbers. These columns
only exist to indicate which students are present in this unit.
  
- If a one to many relationship contains itself attributes, which is perfectly acceptable
in a (E)ERD model. One could be inspired to use SuperColumns. Cassandra SuperColumns are column
that can contain columns themself.
+ If a one to many relationship contains itself attributes, which is perfectly acceptable
in a (E)ERD model. One could be inspired to use SuperColumns. Cassandra SuperColumns are column
that can contain columns themselves.
  
  === Many to Many ===
  {{http://boose.nl/images/manytomany.jpeg}}
  
- Lets look at a very basic part of an (E)ERD model. The power of chen ERD models is that
a lot of implicit information can be left out. Models should be simplefied versions of their
reallife counterparts. No DBMS, relational nor NoSQL, can by itself implement many to many
relationships. In Relational systems we would create a new table that would represent the
relationship. This table would consist of both the keys (foreign keys) of the adjaccent entities,
being primairy togetter, supplemented with its own attributes.
+ Let's look at a very basic part of an (E)ERD model. The power of Chen ERD models is that
a lot of implicit information can be left out. Models should be simplified versions of their
real life counterparts. No DBMS, relational nor NoSQL, can by itself implement many to many
relationships. In Relational systems we would create a new table that would represent the
relationship. This table would consist of both the keys (foreign keys) of the adjacent entities,
being primairy together, supplemented with its own attributes.
  
- As it is perfectly valid in a relational database to have a key value composed of several
columns, in Cassandra there is only one key per ColumnFamily. I did already discuss the need
 to create seperate  ColumnFamily's for one to many relationships given the fact that you
can never tell for sure whether or not maybe in the future a new relation will popup to another
entity sharing the same keyvalues. This means that we will need 5 ColumnFamily's to implement
the model above:
+ As it is perfectly valid in a relational database to have a key value composed of several
columns, in Cassandra there is only one key per ColumnFamily. I did already discuss the need
 to create seperate  ColumnFamily's for one to many relationships given the fact that you
can never tell for sure whether or not maybe in the future a new relation will pop up to another
entity sharing the same keyvalues. This means that we will need 5 ColumnFamily's to implement
the model above:
  
- CF_Order is a straight forward  ColumnFamily, Modeled after the design
+ CF_Order is a straight forward  ColumnFamily, modeled after the design.
  ||||||||<style="text-align: center;">CF_Order ''' ''' ||
  ||<style="text-align: center;" |2>1234 ||order_date ||order_discount ||customer_id
||
  ||20100808 ||0.4 ||1234 ||
@@ -141, +141 @@

  
  
  
- As we cannot itterate throu a ColumnFamily based on any attribute except its key and we
wan't to use a solution to relations that is generic we create a seperate ColumnFamily for
each one to many relationship. As we know that a many to many relationship is typicly solved
by introducing 2 one to many relationships to a newly created ColumnFamily. So there will
be one ColumnFamily to store which orders a product can be found on.
+ As we cannot iterate through a ColumnFamily based on any attribute except its key and we
want to use a solution to relations that is generic. We create a seperate ColumnFamily for
each one to many relationship. As we know that a many to many relationship is typically solved
by introducing 2 one to many relationships to a newly created ColumnFamily. So there will
be one ColumnFamily to store which orders a product can be found on.
  ||||||||<style="text-align: center;">CF_Product_Order ''' ''' ||
  ||<style="text-align: center;" |2>DSK_SGT_5GB_7200 ||1234 ||4321 ||1354 ||
  || || || ||
@@ -151, +151 @@

  
  
  
- And we'll create one ColumnFamily that stores which products can be found on each order.
We can see that this solution tends to invite anomalies. product that have 3 order columns
but no reference to the product in the CF_Order_Product ColumnFamily. It up to the programmers
of the DBMS tier to make sure genic code is available to keep the ColumnFamily's consistent
al of the time.
+ And we'll create one ColumnFamily that stores which products can be found on each order.
We can see that this solution tends to invite anomalies. product that have 3 order columns
but no reference to the product in the CF_Order_Product ColumnFamily. It up to the programmers
of the DBMS tier to make sure genic code is available to keep the ColumnFamily's consistent
all of the time.
  ||||||||<style="text-align: center;">CF_Order_Product ||
  ||<style="text-align: center;" |2>1234 ||DSK_SGT_5GB_7200 || || ||
  || || || ||
@@ -163, +163 @@

  
  
  
- ''' '''Last but not least we'll introduce a ColumnFamily to store the attributes connected
to the relationship between Product and Order. As mentioned it is not possible to create a
key containing 2 seperate values. So we'll have to introduce a new value that concatenates
both values using a strikt format. These formats should be part of a design if implementing
a (E)ERD in Cassandra.
+ ''' '''Last but not least we'll introduce a ColumnFamily to store the attributes connected
to the relationship between Product and Order. As mentioned it is not possible to create a
key containing 2 separate values. So we'll have to introduce a new value that concatenates
both values using a strict format. These formats should be part of a design if implementing
a (E)ERD in Cassandra.
  ||||||||<style="text-align: center;">CF_order_line''' ''' ||
  ||<style="text-align: center;" |2>1234_DSK_SGT_5GB_7200 ||number ||delivery_date ||
||
  ||5 ||20101215 || ||

Mime
View raw message