Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: "Hiller, Dean" <Dean.Hiller@nrel.gov>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Fri, 14 Sep 2012 07:20:23 -0600
Subject: Re: Data Model
Thread-Topic: Data Model
Thread-Index: Ac2Se7P8QvKvQfN2Rfan/FpP7F1FzQ==
Message-ID: <CC788AE0.10B9E%Dean.Hiller@nrel.gov>
In-Reply-To: <A3AFD2D5-E801-4C4A-9FBF-696FE8555221@thelastpickle.com>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.2.3.120616
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

playOrm uses EXACTLY that pattern where @OneToMany becomes student.rowkeySt=
udent1 student.rowkeyStudent2 and the other fields are fixed.  It is a comm=
on pattern in noSQL.

Dean

From: aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>=
>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <us=
er@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, September 14, 2012 3:00 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cas=
sandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Data Model

Consider a course_students col family which gives a list of students for a =
course

I would use two CF's:

Course CF:
* Each row is one course
* Columns are the properties and values of the course

CourseEnrolements CF
* Each row is one course
* Column name is the student ID.
* Column value may be blank or some useful value.

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 3:19 PM, Michael Morris <michael.m.morris@gmail.com<mailt=
o:michael.m.morris@gmail.com>> wrote:

I'm fairly new to Cassandra myself, but had to solve a similar problem.  If=
 ordering of the student number values is not important to you, you can sto=
re them as UTF8 values (Ascii would work too, may be a better choice?), and=
 the resulting columns would be sorted by the lexical ordering of the numer=
ic values (as opposed to numeric sorting), so it would be 1, 10, 11, 2, 3, =
31, 5...

In my situation, I had a document, composed of digitized images of the page=
s.  My row key is the doc id number, and there were document level attribut=
es, as well as page level attributes I wanted to capture in the row.  So in=
 my model, I used composite columns of 3 UTF8 values.  The first is an attr=
ibute descriptor, I used 'a' to indicate a document level attribute, and 'p=
' as a page level attribute.  The second composite value depends on the 1st=
, for 'a' types, the 2nd value is the actual attribute identifier (ex, form=
 type, scanner number, etc...).  For 'p' types, it refers to the page numbe=
r.  Having Cassandra preserve the order of the page numbers is not a priori=
ty for me, so I can deal with the page number being sorted in String order =
in the database, I'll deal with numerical sorting that in the app logic (si=
nce the largest documents we process are only about 100 pages long).  The 3=
rd composite value is empty for all 'a' types, and for 'p' types, it refers=
 to a page level attribute (page landmark information, image skew angle, lo=
cation on disk, etc...).

As an example:

key1 =3D> a:form=3Dx, a:scanner=3D1425436, p:1:skew=3D0.0042142, p:1:file=
=3Dpage1.png, p:2:skew=3D0.0042412, p:2:file=3Dpage2.png
key2 =3D> a:form=3Dq, a:scanner=3D935625, p:1:skew=3D0.00032352, p:1:file=
=3Dother1.png, p:2:skew=3D:0.0002355, p:2:file=3Dother2.png

It's been working well for me when using the Hector client.

Thanks,

Mike

On Thu, Sep 13, 2012 at 12:59 PM, Soumya Acharya <cse.soumya@gmail.com<mail=
to:cse.soumya@gmail.com>> wrote:
I just started learning Cassandra any suggestion where to start with ??


Thanks
Soumya

On Thu, Sep 13, 2012 at 10:54 AM, Roshni Rajagopal <roshni_rajagopal@hotmai=
l.com<mailto:roshni_rajagopal@hotmail.com>> wrote:
I want to learn how we can model a mix of static and dynamic columns in a f=
amily.

Consider a course_students col family which gives a list of students for a =
course
with row key- Course Id
Columns - Name, Teach_Nm, StudID1, StudID2, StudID3
Values - Maths, Prof. Abc, 20,21,25
where 20,21,25 are IDs of students.

We have

fixed columns like Course Name, Teacher Name, and a dynamic number of

columns like 'StudID1', 'StudID2' etc, and my thoughts were that we could

look for 'StudID' and get all the columns with the student Ids in Hector. B=
ut the

question was how would we determine the number for the column, like to add

StudID3 we need to read the row and identify that 2 students are there,

and this is the third one.


So we can remove the number in the column name, altogether and keep

columns like Course Name, Teacher Name, Student:20,Student:21, Student:25,

where the second part is the actual student id. However here we run into

the second issue that we cannot have some columns of a composite format

and some of another format, when we use static column families- all

columns would need to be in the format UTF8:integer We may want to treat

it as a composite column key and not use a delimiter- to get sorting,

validate the types of the parts of the key, not have to search for the

delimiter and separate the 2 components  manually etc.


A third option is to put only data in the column name for students like

Course Name, Teacher Name, 20,21,25 - it would be difficult to identify

that columns with name 20, 21, 25 actually stand for student names - a bit

unreadable.


I hope this is not confusing, and would like to hear your thoughts on this.=
The question is

around when you de-normalize & want to have some static info like name ,

and a dynamic list - whats the best way to model this.


Regards,

Roshni


--
Regards and Thanks
Soumya Kanti Acharya