Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: "Hiller, Dean" <Dean.Hiller@nrel.gov>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wed, 19 Sep 2012 14:44:02 -0600
Subject: Re: Correct model
Thread-Topic: Correct model
Thread-Index: Ac2Wp4D/h292ZLeQSHK1150kGhwxww==
Message-ID: <CC7F898C.11827%Dean.Hiller@nrel.gov>
In-Reply-To: <CC7F8605.11812%Dean.Hiller@nrel.gov>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.2.3.120616
acceptlanguage: en-US
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Oh, quick correction, I was thinking your user row key was in the request
coming in from your first email.

In your first email, you get a request and seem to shove it and a user in
generating the ids which means that user never generates a request ever
again???  If a user sends multiple requests in, how are you looking up his
TimeUUID row key from your first email(I would do the same in my
implementation)?

If you had an ldap unique username, I would just use that as the primary
key meaning you NEVER have to do reads.  If you have a username and need
to lookup a UUID, you would have to do that in both implementations=8Anot a
real big deal though=8Aa quick quick lookup table does the trick there and
in most cases is still fast enough(ie. Read before write here is ok in a
lot of cases).

That X-ref table would simple be rowkey=3Dusername and value=3Dusers real
primary key

Though again, we use ldap and know no one's username is really going to
change so username is our primary key.

Later,
Dean


On 9/19/12 2:33 PM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

>Uhm, unless I am mistaken, a NEW request implies a new UUID so you can
>just write it to both the index to the request row and to the user that
>request was for all in one shot with no need to read, right?
>
>(Also, read before write is not necessarily bad=8Ait really depends on you=
r
>situation but in this case, I don't think you need read before write).
>
>For your structured data comment=8A.
>Actually playOrm stores structured and unstructured data.  It follows the
>pattern cassandra is adopting more and more of "partial" schemas and
>plans to hold to that path.  It is a complete break from JPA due to noSQL
>being so different.
>
>and each request would have its own id, right
>
>Yes, in my design, I choose each request with it's own id.
>
>Wouldn't it be faster to have a composite key in the requestCF itself?
>
>In CQL, don't you have to have an =3D=3D in the first part of the clause
>meaning you would have to select the user id, BUT you wanted requests >
>date no matter which user so the indices I gave you have that information
>with a simple column slice of the data.  The indices I gave you look like
>this(composite column names)=8A. <time1>.<req1>.<user1>,
><time2>.<req2>.<user1>, <time3>.<req3>.<user2>  NOTE that each is a UUID
>there in the <> so are unique.
>
>Maybe there is a way, but I am not sure on how to get all the latest
>request > data for every user=8A.I guess you could always map/reduce but
>that is generally reserved for analytics or maybe updating new index
>tables you are creating for reading faster.
>
>Later,
>Dean
>
>From: Marcelo Elias Del Valle
><mvallebr@gmail.com<mailto:mvallebr@gmail.com>>
>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Date: Wednesday, September 19, 2012 1:47 PM
>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Subject: Re: Correct model
>
>2012/9/19 Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>
>Thinking out loud and I think a bit towards playOrm's model though you
>don=B9t' need to use playroom for this.
>
>1. I would probably have a User with the requests either embedded in or
>the Foreign keys to the requests=8Aeither is fine as long as you get the
>user get ALL FK's and make one request to get the requests for that user
>
>This was my first option. However, everytime I have a new request I would
>need to read the column "request_ids", update its value, and them write
>the result. This would be a read-before-write, which is bad in Cassandra,
>right? Or you were talking about other kinds of FKs?
>
>2. I would create rows for index and index each month of data OR maybe
>index each day of data(depends on your system).  Then, I can just query
>into the index for that one month.  With playOrm S-SQL, this is a simple
>PARTITIONS r(:thismonthParititonId) SELECT r FROM Request r where r.date
>> :date OR you just do a column range query doing the same thing into
>>your index.  The index is basically the wide row pattern ;) with
>>composite keys of <date>.<rowkey of request>
>
>I would consider playOrm in a later step in my project, as my
>understanding now is it is good to store relational data, structured
>data. I cannot predict which columns I am going to store in requestCF.
>But regardless, even in Cassandra, you would still use a composite key,
>but it seems you would create an indexCf using the wide row pattern, and
>each request would have its own id, right? But why? Wouldn't it be faster
>to have a composite key in the requestCF itself?
>
>
>From: Marcelo Elias Del Valle
><mvallebr@gmail.com<mailto:mvallebr@gmail.com><mailto:mvallebr@gmail.com<m
>ailto:mvallebr@gmail.com>>>
>Reply-To:=20
>"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@c
>assandra.apache.org<mailto:user@cassandra.apache.org>>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@c
>assandra.apache.org<mailto:user@cassandra.apache.org>>>
>Date: Wednesday, September 19, 2012 1:02 PM
>To:=20
>"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@c
>assandra.apache.org<mailto:user@cassandra.apache.org>>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@c
>assandra.apache.org<mailto:user@cassandra.apache.org>>>
>Subject: Correct model
>
>I am new to Cassandra and NoSQL at all.
>I built my first model and any comments would be of great help. I am
>describing my thoughts bellow.
>
>It's a very simple model. I will need to store several users and, for
>each user, I will need to store several requests. It request has it's
>insertion time. As the query comes first, here are the only queries I
>will need to run against this model:
>- Select all the requests for an user
>- Select all the users which has new requests, since date D
>
>I created the following model: an UserCF, whose key is a userID generated
>by TimeUUID, and a RequestCF, whose key is composite: UserUUID +
>timestamp. For each user, I will store basic data and, for each request,
>I will insert a lot of columns.
>
>My questions:
>- Is the strategy of using a composite key good for this case? I thought
>in other solutions, but this one seemed to be the best. Another solution
>would be have a non-composite key of type UUID for the requests, and have
>another CF to relate user and request.
>- To perform the second query, instead of selecting if each user has a
>request inserted after date D, I thought in storing the last request
>insertion date into the userCF, everytime I have a new insert for the
>user. It would be a data replication, but I would have no
>read-before-write and I am guessing the second query would perform faster.
>
>Any thoughts?
>
>--
>Marcelo Elias Del Valle
>http://mvalle.com - @mvallebr
>
>
>
>--
>Marcelo Elias Del Valle
>http://mvalle.com - @mvallebr