cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Correct model
Date Fri, 21 Sep 2012 01:50:02 GMT
> I created the following model: an UserCF, whose key is a userID generated by TimeUUID,
and a RequestCF, whose key is composite: UserUUID + timestamp. For each user, I will store
basic data and, for each request, I will insert a lot of columns.

I would consider:

# User CF
* row_key: user_id
* columns: user properties, key=value

# UserRequests CF
* row_key: <user_id : partition_start> where partition_start is the start of a time
partition that makes sense in your domain. e.g. partition monthly. Generally want to avoid
rows the grow forever, as a rule of thumb avoid rows more than a few 10's of MB. 
* columns: two possible approaches:
	1) If the requests are immutable and you generally want all of the data store the request
in a single column using JSON or similar, with the column name a timestamp. 
	2) Otherwise use a composite column name of <timestamp : request_property> to store
the request in many columns. 
	* In either case consider using Reversed comparators so the most recent columns are first

# GlobalRequests CF
	* row_key: partition_start - time partition as above. It may be easier to use the same partition
	* column name: <timestamp : user_id>
	* column value: empty 

> - Select all the requests for an user

Work out the current partition client side, get the first N columns. Then page. 

> - Select all the users which has new requests, since date D
Worm out the current partition client side, get the first N columns from GlobalRequests, make
a multi get call to UserRequests 

NOTE: Assuming the size of the global requests space is not huge.

Hope that helps. 
Aaron Morton
Freelance Developer

On 20/09/2012, at 11:19 AM, Marcelo Elias Del Valle <> wrote:

> In your first email, you get a request and seem to shove it and a user in
> generating the ids which means that user never generates a request ever
> again???  If a user sends multiple requests in, how are you looking up his
> TimeUUID row key from your first email(I would do the same in my
> implementation)?
> Actually, I don't get it from Cassandra. I am using Cassandra for the writes, but to
find the userId I look on a pre-indexed structure, because I think the reads would be faster
this way. I need to find the userId by some key fields, so I use an index like this:
> user ID 5596 -> { name -> "john denver", phone -> "5555 5555", field3 ->
"field 3 data"...., field 10 -> "field 10 data"}
> The values are just examples. This part is not implemented yet and I am looking for alternatives.
Currently we have some similar indexes in SOLR, but we are thinking in keeping the index in
memory and replicating manually in the cluster, or using Voldemort, etc. 
> I might be wrong, but I think Cassandra is great for writes, but a solution like this
would be better for reads.
> If you had an ldap unique username, I would just use that as the primary
> key meaning you NEVER have to do reads.  If you have a username and need
> to lookup a UUID, you would have to do that in both implementationsÅ not a
> real big deal thoughÅ a quick quick lookup table does the trick there and
> in most cases is still fast enough(ie. Read before write here is ok in a
> lot of cases).
> That X-ref table would simple be rowkey=username and value=users real
> primary key
> Though again, we use ldap and know no one's username is really going to
> change so username is our primary key.
> In my case, a single user can have thousands of requests. In my userCF, I will have just
1 user with uuid X, but I am not sure about what to have in my requestCF.
> -- 
> Marcelo Elias Del Valle
> - @mvallebr

View raw message