cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: help creating data model
Date Wed, 24 Aug 2011 22:27:47 GMT
I normally suggest trying a model with Standard CF's first as there are some down sides to
super CF's. If you know there will only be a few sub columns there are probably OK (see http://wiki.apache.org/cassandra/CassandraLimitations).
Your alternative design is fine. Test it out and see what works for you. 

Also (and I know not everyone agrees) depending on the use case it's ok to blob data up. Cassandra
does not *need* to know about the individual properties of your entities. By that I mean there
is not a query planner that can make better decisions about how to execute your query based
on data types and distributions, or how what types columns should have in projections. 

So an alternative here is to collapse VisitantSessions and Sessions into one, and store the
session data as a JSON (or similar) blob in the column value. This works best if you do not
need to concurrently update fields in the entity. So if you write the session data once, or
if you *always* only update from a single thread / process. Or if your data is designed to
be overwritten. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2011, at 1:15 AM, Helder Oliveira wrote:

> Thanks Indranath Ghosh for your tip!
> 
> I will continue here the question.
> 
> Aaron, i have read your suggestion and tried to design your suggestion and i have one
question regarding it.
> 
> Let's forget for now the Requests and Events!
> 
> Just keep the Visitants and the Sessions.
> 
> My goal is when having a visitant get all informations about him, in this case all his
past sessions so i can create a profile using his past.
> 
> You have suggested 3 CF's
> 
> Visitants CF
> key: id
> cn: pn
> cv: pv
> 
> Visitant Sessions CF
> key: visitant id
> cn: session id
> cv: none
> 
> Sessions CF
> key: session id
> cn: pn
> cv: pv
> 
> When i need to know everything about one visitant, i need to query "Visitant Sessions
CF" to get all keys, and then query "Sessions CF" for all keys properties.
> 
> In this case, applying a Super Column Family to the "Sessions" isn't better ?
> 
> I mean something like:
> 
> {
>   "sessions": {
> 
>     "visitant id 1": {
>       "session id 1": {
>         "p1": {"p1": "v1"},
>         "p2": {"jira": "v2"}
>       },
>       "session id 2": {
>         "p1": {"p1": "v1"}
>       }
>     }
> 	
>     "visitant id 2": {
>       "session id 3": {
>         "p1": {"p1": "v1"},
>         "p2": {"jira": "v2"}
>       },
>       "session id 4": {
>         "p1": {"p1": "v1"}
>       }
>     }
>   }
> }
> 
> Using this, i can get all sessions in the second query, instead of having all sessions
only at third query.
> 
> Regarding your notes, the Visitant CF will be almost unchangeable since the beginning
of his creation, the sessions will be added every time a known user visits back, ceasing a
new sessions.
> 
> Thanks a lot for you help guys, and i hope i was not saying crazy things :D
> 
> On Aug 22, 2011, at 11:23 PM, aaron morton wrote:
> 
>> Lets start with something quick and simple, all standard Column Families…
>> 
>> Visitant CF
>> key: id 
>> column name: property name
>> column value: property value 
>> 
>> Visitant Sessions CF
>> key: visitant id 
>> column name: session id
>> column value: none
>> 
>> Session CF
>> 
>> key: session_id
>> column_name: property value 
>> column_value: property value 
>> 
>> key: session_id/requests
>> column_name: request_id
>> column_value: none
>> 
>> key: session_id/events
>> column_name: event_id
>> column_value: none
>> 
>> Requests CF
>> 
>> key: request_id
>> column_name: property name
>> column_value: property value
>> 
>> Event CF
>> 
>> key: event_id
>> column_name: property name
>> column_value: property value
>> 
>> 
>> Notes:
>> 
>> * assuming the Visitant CF is slowing changing i kept it in it's own cf.  
>> * using compound keys to keep information related to sessions in the same CF. These
could be diff CF's,or in the Request or Event CF. 
>> * the best model is the one that allows you to do your reads by getting one or a
few rows from a single cf. 
>> * you could collapse the Request and Event CF's into one. 
>> 
>> If the event and request data is immutable (or there is no issues with concurrent
modifications) I would recommend this…
>> 
>> Request / Event CF:
>> 
>> key: session_id/events or session_id/requests
>> column_name: event_id or session_id
>> column_value: data
>> 
>> 
>> Start with the simple model and then make changes to better handle your read queries.
>> 
>> Have fun :)
>> 
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 22/08/2011, at 11:13 PM, Helder Oliveira wrote:
>> 
>>> Hello all,
>>> 
>>> i have a SQL structure like this:
>>> 
>>> Visitant ( has several properties )
>>> Visitant has many Sessions
>>> Sessions ( has several properties )
>>> Sessions has many Requests ( has several properties )
>>> Sessions has many Events ( has several properties )
>>> 
>>> 
>>> i have read a lot and still confused how to put this on cassandra, can someone
give me a idea ?
>> 
> 


Mime
View raw message