incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Data Model
Date Fri, 14 Sep 2012 09:00:48 GMT
> Consider a course_students col family which gives a list of students for a course

I would use two CF's:

Course CF:
	* Each row is one course
	* Columns are the properties and values of the course

CourseEnrolements CF
	* Each row is one course
	* Column name is the student ID. 
	* Column value may be blank or some useful value. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 3:19 PM, Michael Morris <michael.m.morris@gmail.com> wrote:

> I'm fairly new to Cassandra myself, but had to solve a similar problem.  If ordering
of the student number values is not important to you, you can store them as UTF8 values (Ascii
would work too, may be a better choice?), and the resulting columns would be sorted by the
lexical ordering of the numeric values (as opposed to numeric sorting), so it would be 1,
10, 11, 2, 3, 31, 5...
> 
> In my situation, I had a document, composed of digitized images of the pages.  My row
key is the doc id number, and there were document level attributes, as well as page level
attributes I wanted to capture in the row.  So in my model, I used composite columns of 3
UTF8 values.  The first is an attribute descriptor, I used 'a' to indicate a document level
attribute, and 'p' as a page level attribute.  The second composite value depends on the 1st,
for 'a' types, the 2nd value is the actual attribute identifier (ex, form type, scanner number,
etc...).  For 'p' types, it refers to the page number.  Having Cassandra preserve the order
of the page numbers is not a priority for me, so I can deal with the page number being sorted
in String order in the database, I'll deal with numerical sorting that in the app logic (since
the largest documents we process are only about 100 pages long).  The 3rd composite value
is empty for all 'a' types, and for 'p' types, it refers to a page level attribute (page landmark
information, image skew angle, location on disk, etc...).
> 
> As an example:
> 
> key1 => a:form=x, a:scanner=1425436, p:1:skew=0.0042142, p:1:file=page1.png, p:2:skew=0.0042412,
p:2:file=page2.png
> key2 => a:form=q, a:scanner=935625, p:1:skew=0.00032352, p:1:file=other1.png, p:2:skew=:0.0002355,
p:2:file=other2.png
> 
> It's been working well for me when using the Hector client.
> 
> Thanks,
> 
> Mike
> 
> On Thu, Sep 13, 2012 at 12:59 PM, Soumya Acharya <cse.soumya@gmail.com> wrote:
> I just started learning Cassandra any suggestion where to start with ??
> 
> 
> Thanks
> Soumya 
> 
> On Thu, Sep 13, 2012 at 10:54 AM, Roshni Rajagopal <roshni_rajagopal@hotmail.com>
wrote:
> I want to learn how we can model a mix of static and dynamic columns in a family.
> 
> Consider a course_students col family which gives a list of students for a course
> with row key- Course Id
> Columns - Name, Teach_Nm, StudID1, StudID2, StudID3
> Values - Maths, Prof. Abc, 20,21,25 
> where 20,21,25 are IDs of students.
> 
> We have
> fixed columns like Course Name, Teacher Name, and a dynamic number of
> 
> columns like 'StudID1', 'StudID2' etc, and my thoughts were that we could
> 
> look for 'StudID' and get all the columns with the student Ids in Hector. But the
> 
> question was how would we determine the number for the column, like to add
> 
> StudID3 we need to read the row and identify that 2 students are there,
> 
> and this is the third one.
> 
> 
> 
> So we can remove the number in the column name, altogether and keep
> 
> columns like Course Name, Teacher Name, Student:20,Student:21, Student:25,
> 
> where the second part is the actual student id. However here we run into
> 
> the second issue that we cannot have some columns of a composite format
> 
> and some of another format, when we use static column families- all
> 
> columns would need to be in the format UTF8:integer We may want to treat
> 
> it as a composite column key and not use a delimiter- to get sorting,
> 
> validate the types of the parts of the key, not have to search for the
> 
> delimiter and separate the 2 components  manually etc. 
> 
> 
> 
> A third option is to put only data in the column name for students like
> 
> Course Name, Teacher Name, 20,21,25 - it would be difficult to identify
> 
> that columns with name 20, 21, 25 actually stand for student names - a bit
> 
> unreadable.
> 
> 
> 
> I hope this is not confusing, and would like to hear your thoughts on this.The question
is
> 
> around when you de-normalize & want to have some static info like name ,
> 
> and a dynamic list - whats the best way to model this.
> 
> 
> 
> Regards,
> 
> Roshni
> 
> 
> 
> 
> -- 
> Regards and Thanks 
> Soumya Kanti Acharya
> 


Mime
View raw message