Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4A806DFA7 for ; Fri, 14 Sep 2012 13:20:56 +0000 (UTC) Received: (qmail 51107 invoked by uid 500); 14 Sep 2012 13:20:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 51031 invoked by uid 500); 14 Sep 2012 13:20:53 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 51023 invoked by uid 99); 14 Sep 2012 13:20:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Sep 2012 13:20:53 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.174.58.134] (HELO XEDGEA.nrel.gov) (192.174.58.134) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Sep 2012 13:20:47 +0000 Received: from XHUBB.nrel.gov (10.20.4.59) by XEDGEA.nrel.gov (192.174.58.134) with Microsoft SMTP Server (TLS) id 8.3.245.1; Fri, 14 Sep 2012 07:20:24 -0600 Received: from MAILBOX2.nrel.gov ([fe80::19a0:6c19:6421:12f]) by XHUBB.nrel.gov ([::1]) with mapi; Fri, 14 Sep 2012 07:20:26 -0600 From: "Hiller, Dean" To: "user@cassandra.apache.org" Date: Fri, 14 Sep 2012 07:20:23 -0600 Subject: Re: Data Model Thread-Topic: Data Model Thread-Index: Ac2Se7P8QvKvQfN2Rfan/FpP7F1FzQ== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org playOrm uses EXACTLY that pattern where @OneToMany becomes student.rowkeySt= udent1 student.rowkeyStudent2 and the other fields are fixed. It is a comm= on pattern in noSQL. Dean From: aaron morton = > Reply-To: "user@cassandra.apache.org" > Date: Friday, September 14, 2012 3:00 AM To: "user@cassandra.apache.org" > Subject: Re: Data Model Consider a course_students col family which gives a list of students for a = course I would use two CF's: Course CF: * Each row is one course * Columns are the properties and values of the course CourseEnrolements CF * Each row is one course * Column name is the student ID. * Column value may be blank or some useful value. Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/09/2012, at 3:19 PM, Michael Morris > wrote: I'm fairly new to Cassandra myself, but had to solve a similar problem. If= ordering of the student number values is not important to you, you can sto= re them as UTF8 values (Ascii would work too, may be a better choice?), and= the resulting columns would be sorted by the lexical ordering of the numer= ic values (as opposed to numeric sorting), so it would be 1, 10, 11, 2, 3, = 31, 5... In my situation, I had a document, composed of digitized images of the page= s. My row key is the doc id number, and there were document level attribut= es, as well as page level attributes I wanted to capture in the row. So in= my model, I used composite columns of 3 UTF8 values. The first is an attr= ibute descriptor, I used 'a' to indicate a document level attribute, and 'p= ' as a page level attribute. The second composite value depends on the 1st= , for 'a' types, the 2nd value is the actual attribute identifier (ex, form= type, scanner number, etc...). For 'p' types, it refers to the page numbe= r. Having Cassandra preserve the order of the page numbers is not a priori= ty for me, so I can deal with the page number being sorted in String order = in the database, I'll deal with numerical sorting that in the app logic (si= nce the largest documents we process are only about 100 pages long). The 3= rd composite value is empty for all 'a' types, and for 'p' types, it refers= to a page level attribute (page landmark information, image skew angle, lo= cation on disk, etc...). As an example: key1 =3D> a:form=3Dx, a:scanner=3D1425436, p:1:skew=3D0.0042142, p:1:file= =3Dpage1.png, p:2:skew=3D0.0042412, p:2:file=3Dpage2.png key2 =3D> a:form=3Dq, a:scanner=3D935625, p:1:skew=3D0.00032352, p:1:file= =3Dother1.png, p:2:skew=3D:0.0002355, p:2:file=3Dother2.png It's been working well for me when using the Hector client. Thanks, Mike On Thu, Sep 13, 2012 at 12:59 PM, Soumya Acharya > wrote: I just started learning Cassandra any suggestion where to start with ?? Thanks Soumya On Thu, Sep 13, 2012 at 10:54 AM, Roshni Rajagopal > wrote: I want to learn how we can model a mix of static and dynamic columns in a f= amily. Consider a course_students col family which gives a list of students for a = course with row key- Course Id Columns - Name, Teach_Nm, StudID1, StudID2, StudID3 Values - Maths, Prof. Abc, 20,21,25 where 20,21,25 are IDs of students. We have fixed columns like Course Name, Teacher Name, and a dynamic number of columns like 'StudID1', 'StudID2' etc, and my thoughts were that we could look for 'StudID' and get all the columns with the student Ids in Hector. B= ut the question was how would we determine the number for the column, like to add StudID3 we need to read the row and identify that 2 students are there, and this is the third one. So we can remove the number in the column name, altogether and keep columns like Course Name, Teacher Name, Student:20,Student:21, Student:25, where the second part is the actual student id. However here we run into the second issue that we cannot have some columns of a composite format and some of another format, when we use static column families- all columns would need to be in the format UTF8:integer We may want to treat it as a composite column key and not use a delimiter- to get sorting, validate the types of the parts of the key, not have to search for the delimiter and separate the 2 components manually etc. A third option is to put only data in the column name for students like Course Name, Teacher Name, 20,21,25 - it would be difficult to identify that columns with name 20, 21, 25 actually stand for student names - a bit unreadable. I hope this is not confusing, and would like to hear your thoughts on this.= The question is around when you de-normalize & want to have some static info like name , and a dynamic list - whats the best way to model this. Regards, Roshni -- Regards and Thanks Soumya Kanti Acharya