incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <>
Subject Re: Data Model
Date Fri, 14 Sep 2012 13:20:23 GMT
playOrm uses EXACTLY that pattern where @OneToMany becomes student.rowkeyStudent1 student.rowkeyStudent2
and the other fields are fixed.  It is a common pattern in noSQL.


From: aaron morton <<>>
Reply-To: "<>" <<>>
Date: Friday, September 14, 2012 3:00 AM
To: "<>" <<>>
Subject: Re: Data Model

Consider a course_students col family which gives a list of students for a course

I would use two CF's:

Course CF:
* Each row is one course
* Columns are the properties and values of the course

CourseEnrolements CF
* Each row is one course
* Column name is the student ID.
* Column value may be blank or some useful value.

Hope that helps.

Aaron Morton
Freelance Developer

On 14/09/2012, at 3:19 PM, Michael Morris <<>>

I'm fairly new to Cassandra myself, but had to solve a similar problem.  If ordering of the
student number values is not important to you, you can store them as UTF8 values (Ascii would
work too, may be a better choice?), and the resulting columns would be sorted by the lexical
ordering of the numeric values (as opposed to numeric sorting), so it would be 1, 10, 11,
2, 3, 31, 5...

In my situation, I had a document, composed of digitized images of the pages.  My row key
is the doc id number, and there were document level attributes, as well as page level attributes
I wanted to capture in the row.  So in my model, I used composite columns of 3 UTF8 values.
 The first is an attribute descriptor, I used 'a' to indicate a document level attribute,
and 'p' as a page level attribute.  The second composite value depends on the 1st, for 'a'
types, the 2nd value is the actual attribute identifier (ex, form type, scanner number, etc...).
 For 'p' types, it refers to the page number.  Having Cassandra preserve the order of the
page numbers is not a priority for me, so I can deal with the page number being sorted in
String order in the database, I'll deal with numerical sorting that in the app logic (since
the largest documents we process are only about 100 pages long).  The 3rd composite value
is empty for all 'a' types, and for 'p' types, it refers to a page level attribute (page landmark
information, image skew angle, location on disk, etc...).

As an example:

key1 => a:form=x, a:scanner=1425436, p:1:skew=0.0042142, p:1:file=page1.png, p:2:skew=0.0042412,
key2 => a:form=q, a:scanner=935625, p:1:skew=0.00032352, p:1:file=other1.png, p:2:skew=:0.0002355,

It's been working well for me when using the Hector client.



On Thu, Sep 13, 2012 at 12:59 PM, Soumya Acharya <<>>
I just started learning Cassandra any suggestion where to start with ??


On Thu, Sep 13, 2012 at 10:54 AM, Roshni Rajagopal <<>>
I want to learn how we can model a mix of static and dynamic columns in a family.

Consider a course_students col family which gives a list of students for a course
with row key- Course Id
Columns - Name, Teach_Nm, StudID1, StudID2, StudID3
Values - Maths, Prof. Abc, 20,21,25
where 20,21,25 are IDs of students.

We have

fixed columns like Course Name, Teacher Name, and a dynamic number of

columns like 'StudID1', 'StudID2' etc, and my thoughts were that we could

look for 'StudID' and get all the columns with the student Ids in Hector. But the

question was how would we determine the number for the column, like to add

StudID3 we need to read the row and identify that 2 students are there,

and this is the third one.

So we can remove the number in the column name, altogether and keep

columns like Course Name, Teacher Name, Student:20,Student:21, Student:25,

where the second part is the actual student id. However here we run into

the second issue that we cannot have some columns of a composite format

and some of another format, when we use static column families- all

columns would need to be in the format UTF8:integer We may want to treat

it as a composite column key and not use a delimiter- to get sorting,

validate the types of the parts of the key, not have to search for the

delimiter and separate the 2 components  manually etc.

A third option is to put only data in the column name for students like

Course Name, Teacher Name, 20,21,25 - it would be difficult to identify

that columns with name 20, 21, 25 actually stand for student names - a bit


I hope this is not confusing, and would like to hear your thoughts on this.The question is

around when you de-normalize & want to have some static info like name ,

and a dynamic list - whats the best way to model this.



Regards and Thanks
Soumya Kanti Acharya

View raw message