cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Elias Del Valle <>
Subject Re: Correct model
Date Sun, 23 Sep 2012 16:23:48 GMT
2012/9/20 aaron morton <>

> I would consider:
> # User CF
> * row_key: user_id
> * columns: user properties, key=value
> # UserRequests CF
> * row_key: <user_id : partition_start> where partition_start is the start
> of a time partition that makes sense in your domain. e.g. partition
> monthly. Generally want to avoid rows the grow forever, as a rule of thumb
> avoid rows more than a few 10's of MB.
> * columns: two possible approaches:
> 1) If the requests are immutable and you generally want all of the data
> store the request in a single column using JSON or similar, with the column
> name a timestamp.
> 2) Otherwise use a composite column name of <timestamp : request_property>
> to store the request in many columns.
> * In either case consider using Reversed comparators so the most recent
> columns are first  see
> # GlobalRequests CF
> * row_key: partition_start - time partition as above. It may be easier to
> use the same partition scheme.
> * column name: <timestamp : user_id>
> * column value: empty

Ok, I think I understood your suggestion... But the only advantage in this
solution is to split data among partitions? I understood how it would work,
but I didn't understand why it's better than the other solution, without
the GlobalRequests CF

> - Select all the requests for an user
> Work out the current partition client side, get the first N columns. Then
> page.

What do you mean here by current partition? You mean I would perform a
query for each particition? If I want all the requests for the user,
couldn't I just select all UserRequest records which start with "userId"? I
might be missing something here, but in my understanding if I use hector to
query a column familly I can do that and Cassandra servers will
automatically communicate to each other to get the data I need, right? Is
it bad? I really didn't understand why to use partitions.

> - Select all the users which has new requests, since date D
> Worm out the current partition client side, get the first N columns from
> GlobalRequests, make a multi get call to UserRequests
> NOTE: Assuming the size of the global requests space is not huge.
> Hope that helps.
 For sure it is helping a lot. However, I don't know what is a multiget...
I saw the hector api reference and found this method, but not sure about
what Cassandra would do internally if I do a multiget... Is this expensive
in terms of performance and latency?

Marcelo Elias Del Valle - @mvallebr

View raw message