cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Trivial Update of "ThomasBoose/EERD model components to Cassandra Column family's" by ThomasBoose
Date Sat, 11 Dec 2010 15:27:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "ThomasBoose/EERD model components to Cassandra Column family's" page has been changed
by ThomasBoose.
http://wiki.apache.org/cassandra/ThomasBoose/EERD%20model%20components%20to%20Cassandra%20Column%20family%27s?action=diff&rev1=7&rev2=8

--------------------------------------------------

  ##master-date:Unknown-Date
  #format wiki
  #language en
- = A way to implement EERD components in Cassandra =
+ = A way to implement (E)ERD components in Cassandra =
  == Intro ==
  This page describes model tranformations from EERD concepts into Cassandra ColumnFamily
concepts. All input is welcome.
  
@@ -33, +33 @@

  ==== Equal elements ====
  Sometimes all the elements are part of both collections on either side of the relationship.
The reasons these collections are moddeled seperately are most often based on security issues
or functional differences. One solution in a Cassandra database would be the same as you would
implement such a relation in an RDBMS. Simply by sharing the same key in both ColumnFamily's.
Inserting a key in one of these ColumnFamily's would insert the same in the other and vise
versa. Updating an existing key in either ColumnFamily would not result in any change in the
other. Deleting a key from one ColumnFamily will result in deleting the same key in the other
family as well, providing this would be allowed.
  
- ''I'm not sure to what detaillevel ''''secu'rity rules can apply in a Cassandra database.
At least I know that one can creat logins per cluster.''
+ ''I'm not sure to what detaillevel security rules can apply in a Cassandra database. At
least I know that one can creat logins per cluster.''
  
  If it is necessary to use different keys for both collections, sometimes it is not up to
one designer to select both keys, although the number of element are equal and they are related
one on one, in a relational model the designer gets to select either key to insert into the
other collection with an unique and foreign key constraint.
  
- {{http://boose.nl/images/oneononeequal.jpeg|http://boose.nl/images/oneononeequal.jpeg}}
+ ''' {{http://boose.nl/images/oneononeequal.jpeg}} '''
  
  In Cassandra modeling you are forced to either croslink both key's, So you'd design both
key's foreign in both ColumnFamily's. Or you create a third ColumnFamily in which you store
both keys preceded by a token to which columfamily you are refering. Lets focus on the first
option. Say we hand out phones to our employees and we agree that every employee will always
have one phone. and phones that are not used are not stored in our database. The phone has
a phonenumber as key where the employee has a social security number. In order to know which
number to dial when looking for employee X and who is calling giving a specific phonenumber
we need to store both keys foreign in both ColumnFamily's.
- ||||||||                          CF_Employee''' ''' ||
+ ||||||||<style="text-align: center;">CF_Employee ''' ''' ||
  ||<style="text-align: center;" |2>123-12-1234 ||name ||phone ||salary ||
  ||John ||0555-123456 ||10.000 ||
  ||<style="text-align: center;" |2>321-21-4321 ||name ||phone ||salary ||
  ||Jane ||0555-654321 ||12.000 ||
  
  
- ||||||<tablewidth="400px" tablestyle="text-align: left;"style="text-align: center;">CF_Phone'''
''' ||
+ ||||||<tablewidth="400px" tablestyle="text-align: left;"style="text-align: center;">CF_Phone
''' ''' ||
  ||<style="text-align: center;" |2>0555-123456 ||employee ||credit ||
  ||123-12-1234 ||10 ||
  ||<style="text-align: center;" |2>0555-654321 ||employee ||credit ||
@@ -73, +73 @@

  
  As stated we prefer the foreign key to be the same value as the key from the superset ColumnFamily.
In every other case we'll have to introduce logic to keep the relation cosistent. In any case
you have to enforce the existance of all keys in the subset in the superset. Logic must also
be provided when deleting elements from the superset with respect to the related element in
the subset.These kind of relationships are also found in specialisations. The given example
can be viewed as a single non total specialisation.
  
+ In order to create a disjunct specialisation one should add an column to the employee ColumnFamily
containing a reference to a single subset ColumnFamily. Logic has to be introduced to keep
your data consitent I would again suggest to implement this logic in a DBMS tier.
+ 
  ==== Overlap ====
  The easiest one on one relation to implement is the one in which elements in both collections
do not need to be in the other but might. If at all possible create one big ColumnFamily that
collects all elements from both collections and specialise to your intended ColumnFamily's,
even if there is no corresponding attribute (column). If absolutly neccessary you can provide
keys from either ColumnFamily if the values are not the same but one on one related. See above
for contraint considerations.
- 
- If you want to make a specialisation disjunkt, you will have to introduce am attribuut in
the top ColumFamily, Employee in the last example, That store a reference to of the specialisation
ColumnFamily's ( Jobber or Contractor in this case). Logic has to be introduced to keep your
data consitent I would again suggest to implement this logic in a DBMS tier.
  
  === 1 to Many ===
  In one to many relationships we add the key from the "one" side foreign to the "many" side.
So if we're moddeling students studing at only one school-unit at a time we would add the
unit's key to the student as foreign. Considering that no foreign key logic is provided you
will have to write your own code to enforce consistancy in unit's existing, when the unit
attribute of a student is set, and defining behaviour when deleting a unit. Cosiddering the
fact that this kind of relation is very common one could best create the logic for this at
a seperate DBMS tier.
@@ -84, +84 @@

  Every student has only one school-unit so we enforce one static name of a column that will
reference this unit. for instance this column in the cf_Student ColumnFamily is called "school-unit".
In a cassandra database this is not sufficient to retrieve all student within this unit. One
could find answers to questions like these but it would require quite a lot of processing
power. If a ColumnFamily, the cf_School_unit family in this case, has only one of these relations,
then one could chose to add all student keys to that ColumnFamily it self. I would not count
on this situation persisting in future releases of you system and therefore sugest that you'de
provide seperate ColumnFamily's for each one to many relationship that you model.
  
  This would leed to three ColumnFamily's
- ||||||||<tablewidth="400px"style="text-align: center;">CF_Student''' ''' ||
+ ||||||||<tablewidth="400px"style="text-align: center;">CF_Student ''' ''' ||
  ||<style="text-align: center;" |2>123-12-1234 ||name ||unit ||city ||
- ||John ||SE ||the hague ||
+ ||John ||SE ||The Hague ||
  ||<style="text-align: center;" |2>321-21-4321 ||name ||unit ||city ||
  ||Jane ||SE ||Amsterdam ||
  
  
- ||||||<tablewidth="400px" tablestyle="text-align: left;"style="text-align: center;">CF_School_Unit'''
''' ||
+ ||||||<tablewidth="400px" tablestyle="text-align: left;"style="text-align: center;">CF_School_Unit
''' ''' ||
  ||<style="text-align: center;" |2>SE ||name ||loc ||
  ||software engineering ||hsl ||
  
  
- ||||||<tablewidth="400px" tablestyle="text-align: left;"style="text-align: center;">CFK_School_Unit_Student'''
''' ||
+ ||||||<tablewidth="400px" tablestyle="text-align: left;"style="text-align: center;">CFK_School_Unit_Student
''' ''' ||
  ||<style="text-align: center;" |2>SE ||123-12-1234 ||321-21-4321 ||
  || || ||
  
@@ -105, +105 @@

  
  No value's are actualy stored in the columns indicating de studentnumbers. These columns
only exist to indicate which students are present in this unit.
  
+ If a one to many relationship contains itself attributes, which is perfectly acceptable
in a EERD model. One could be inspired to use SuperColumns. Cassandra SuperColumns are column
that can contain columns themself.
+ 

Mime
View raw message