Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01CE5DFA8 for ; Fri, 14 Sep 2012 09:01:26 +0000 (UTC) Received: (qmail 19302 invoked by uid 500); 14 Sep 2012 09:01:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 19240 invoked by uid 500); 14 Sep 2012 09:01:22 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 19224 invoked by uid 99); 14 Sep 2012 09:01:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Sep 2012 09:01:22 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a92.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Sep 2012 09:01:14 +0000 Received: from homiemail-a92.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a92.g.dreamhost.com (Postfix) with ESMTP id 081DD3DC06E for ; Fri, 14 Sep 2012 02:00:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=tl8LWBeCceFWvrVgjouX3q2Vl7 A=; b=FIt/X4vwFfBK6iBSje66i8ZiqMc4zaL0/6muCgHQnsGxq1ZQzQDCqFSUhx 7I9fBu8u0M3IeEOGsz5UX1IiUDck8bGdJcklrYVAICA3cyl624YnIHxhFA6Ob3OM 4m/z1yxxKGc/t7vFrcqFSsuvZIB2WbOENoFwAAS8Hm1dvrvG0= Received: from [172.16.1.10] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a92.g.dreamhost.com (Postfix) with ESMTPSA id 536843DC05E for ; Fri, 14 Sep 2012 02:00:50 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_52DE8D18-8C8A-441C-9A97-20C33FA5B535" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\)) Subject: Re: Data Model Date: Fri, 14 Sep 2012 21:00:48 +1200 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1486) --Apple-Mail=_52DE8D18-8C8A-441C-9A97-20C33FA5B535 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > Consider a course_students col family which gives a list of students = for a course I would use two CF's: Course CF: * Each row is one course * Columns are the properties and values of the course CourseEnrolements CF * Each row is one course * Column name is the student ID.=20 * Column value may be blank or some useful value.=20 Hope that helps.=20 ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/09/2012, at 3:19 PM, Michael Morris = wrote: > I'm fairly new to Cassandra myself, but had to solve a similar = problem. If ordering of the student number values is not important to = you, you can store them as UTF8 values (Ascii would work too, may be a = better choice?), and the resulting columns would be sorted by the = lexical ordering of the numeric values (as opposed to numeric sorting), = so it would be 1, 10, 11, 2, 3, 31, 5... >=20 > In my situation, I had a document, composed of digitized images of the = pages. My row key is the doc id number, and there were document level = attributes, as well as page level attributes I wanted to capture in the = row. So in my model, I used composite columns of 3 UTF8 values. The = first is an attribute descriptor, I used 'a' to indicate a document = level attribute, and 'p' as a page level attribute. The second = composite value depends on the 1st, for 'a' types, the 2nd value is the = actual attribute identifier (ex, form type, scanner number, etc...). = For 'p' types, it refers to the page number. Having Cassandra preserve = the order of the page numbers is not a priority for me, so I can deal = with the page number being sorted in String order in the database, I'll = deal with numerical sorting that in the app logic (since the largest = documents we process are only about 100 pages long). The 3rd composite = value is empty for all 'a' types, and for 'p' types, it refers to a page = level attribute (page landmark information, image skew angle, location = on disk, etc...). >=20 > As an example: >=20 > key1 =3D> a:form=3Dx, a:scanner=3D1425436, p:1:skew=3D0.0042142, = p:1:file=3Dpage1.png, p:2:skew=3D0.0042412, p:2:file=3Dpage2.png > key2 =3D> a:form=3Dq, a:scanner=3D935625, p:1:skew=3D0.00032352, = p:1:file=3Dother1.png, p:2:skew=3D:0.0002355, p:2:file=3Dother2.png >=20 > It's been working well for me when using the Hector client. >=20 > Thanks, >=20 > Mike >=20 > On Thu, Sep 13, 2012 at 12:59 PM, Soumya Acharya = wrote: > I just started learning Cassandra any suggestion where to start with = ?? >=20 >=20 > Thanks > Soumya=20 >=20 > On Thu, Sep 13, 2012 at 10:54 AM, Roshni Rajagopal = wrote: > I want to learn how we can model a mix of static and dynamic columns = in a family. >=20 > Consider a course_students col family which gives a list of students = for a course > with row key- Course Id > Columns - Name, Teach_Nm, StudID1, StudID2, StudID3 > Values - Maths, Prof. Abc, 20,21,25=20 > where 20,21,25 are IDs of students. >=20 > We have > fixed columns like Course Name, Teacher Name, and a dynamic number of >=20 > columns like 'StudID1', 'StudID2' etc, and my thoughts were that we = could >=20 > look for 'StudID' and get all the columns with the student Ids in = Hector. But the >=20 > question was how would we determine the number for the column, like to = add >=20 > StudID3 we need to read the row and identify that 2 students are = there, >=20 > and this is the third one. >=20 >=20 >=20 > So we can remove the number in the column name, altogether and keep >=20 > columns like Course Name, Teacher Name, Student:20,Student:21, = Student:25, >=20 > where the second part is the actual student id. However here we run = into >=20 > the second issue that we cannot have some columns of a composite = format >=20 > and some of another format, when we use static column families- all >=20 > columns would need to be in the format UTF8:integer We may want to = treat >=20 > it as a composite column key and not use a delimiter- to get sorting, >=20 > validate the types of the parts of the key, not have to search for the >=20 > delimiter and separate the 2 components manually etc.=20 >=20 >=20 >=20 > A third option is to put only data in the column name for students = like >=20 > Course Name, Teacher Name, 20,21,25 - it would be difficult to = identify >=20 > that columns with name 20, 21, 25 actually stand for student names - a = bit >=20 > unreadable. >=20 >=20 >=20 > I hope this is not confusing, and would like to hear your thoughts on = this.The question is >=20 > around when you de-normalize & want to have some static info like name = , >=20 > and a dynamic list - whats the best way to model this. >=20 >=20 >=20 > Regards, >=20 > Roshni >=20 >=20 >=20 >=20 > --=20 > Regards and Thanks=20 > Soumya Kanti Acharya >=20 --Apple-Mail=_52DE8D18-8C8A-441C-9A97-20C33FA5B535 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
Consider a course_students col family which gives a list of = students for a = course

I would use two = CF's:

Course CF:
* Each = row is one course
* Columns are the properties and = values of the course

CourseEnrolements = CF
= * Each row is one course
* Column name is the student = ID. 
* Column value may be blank or = some useful value. 

Hope that = helps. 

http://www.thelastpickle.com

On 14/09/2012, at 3:19 PM, Michael Morris <michael.m.morris@gmail.com&= gt; wrote:

I'm fairly new to Cassandra myself, but had to solve a = similar problem.  If ordering of the student number values is not = important to you, you can store them as UTF8 values (Ascii would work = too, may be a better choice?), and the resulting columns would be sorted = by the lexical ordering of the numeric values (as opposed to numeric = sorting), so it would be 1, 10, 11, 2, 3, 31, 5...

In my situation, I had a document, composed of digitized images of = the pages.  My row key is the doc id number, and there were document = level=20 attributes, as well as page level attributes I wanted to capture in the=20= row.  So in my model, I used composite columns of 3 UTF8 = values.  The first is an attribute descriptor, I used 'a' to = indicate a document level attribute, and 'p' as a page level = attribute.  The second composite value depends on the 1st, for 'a' = types, the 2nd value is the actual attribute identifier (ex, form type, = scanner number, etc...).  For 'p' types, it refers to the page = number.  Having Cassandra preserve the order of the page numbers is = not a priority for me, so I can deal with the page number being sorted = in String order in the database, I'll deal with numerical sorting that = in the app logic (since the largest documents we process are only about = 100 pages long).  The 3rd composite value is empty for all 'a' = types, and for 'p' types, it refers to a page level attribute (page = landmark information, image skew angle, location on disk, etc...).

As an example:

key1 =3D> a:form=3Dx, a:scanner=3D1425436, = p:1:skew=3D0.0042142, p:1:file=3Dpage1.png, p:2:skew=3D0.0042412, = p:2:file=3Dpage2.png
key2 =3D> a:form=3Dq, a:scanner=3D935625, = p:1:skew=3D0.00032352, p:1:file=3Dother1.png, p:2:skew=3D:0.0002355, = p:2:file=3Dother2.png

It's been working well for me when using the Hector = client.

Thanks,

Mike

On = Thu, Sep 13, 2012 at 12:59 PM, Soumya Acharya <cse.soumya@gmail.com> wrote:
I just started = learning Cassandra any suggestion where to start with = ??


Thanks
Soumya

On Thu, Sep 13, 2012 at = 10:54 AM, Roshni Rajagopal <roshni_rajagopal@hotmail.com> wrote:
I want to learn how we can model a mix of static and dynamic = columns in a family.

Consider a course_students = col family which gives a list of students for a course
with = row key- Course Id
Columns - Name, Teach_Nm, StudID1, StudID2, = StudID3
Values - Maths, Prof. Abc, = 20,21,25 
where 20,21,25 are IDs of = students.

We = have

fixed columns like Course Name, Teacher Name, = and a dynamic number of

columns like 'StudID1', 'StudID2' etc, and = my thoughts were that we could

look for 'StudID' and get all the = columns with the student Ids in Hector. But the

question was how = would we determine the number for the column, like to add

StudID3 = we need to read the row and identify that 2 students are = there,

and this is the third one.


So we can = remove the number in the column name, altogether and keep

columns = like Course Name, Teacher Name, Student:20,Student:21, = Student:25,

where the second part is the actual student id. = However here we run into

the second issue that we cannot have some = columns of a composite format

and some of another format, when we = use static column families- all

columns would need to be in the = format UTF8:integer We may want to treat

it as a composite column = key and not use a delimiter- to get sorting,

validate the types of = the parts of the key, not have to search for the

delimiter and = separate the 2 components  manually = etc. 


A third option is to put only data in the = column name for students like

Course Name, Teacher Name, 20,21,25 = - it would be difficult to identify

that columns with name 20, 21, = 25 actually stand for student names - a = bit

unreadable.


I hope this is not confusing, and = would like to hear your thoughts on this.The question is

around when you = de-normalize & want to have some static info like name ,

and a = dynamic list - whats the best way to model = this.


Regards,

Roshni

=



--
Regards and Thanks =
Soumya Kanti Acharya


= --Apple-Mail=_52DE8D18-8C8A-441C-9A97-20C33FA5B535--