cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Helder Oliveira <helder.olive...@byside.com>
Subject Re: help creating data model
Date Thu, 25 Aug 2011 10:50:58 GMT
Hello,

thanks for your time.

I have suggested a SCF but i am still testing the system with CF, making some tests and testing
the data flow ( insert / select ).

Making subdata as JSON already came into my mind, but it's not possible because later i will
need to apply filter to that data, and if it is in JSON i need to fetch all and filter on
the programming side. Correct me if i am wrong.

Well i will continue the tests with CF, things are getting more clear for me now.

Thanks a lot guys for answer and spending time with some newbie questions :)


On Aug 24, 2011, at 11:27 PM, aaron morton wrote:

> I normally suggest trying a model with Standard CF's first as there are some down sides
to super CF's. If you know there will only be a few sub columns there are probably OK (see
http://wiki.apache.org/cassandra/CassandraLimitations). Your alternative design is fine. Test
it out and see what works for you. 
> 
> Also (and I know not everyone agrees) depending on the use case it's ok to blob data
up. Cassandra does not *need* to know about the individual properties of your entities. By
that I mean there is not a query planner that can make better decisions about how to execute
your query based on data types and distributions, or how what types columns should have in
projections. 
> 
> So an alternative here is to collapse VisitantSessions and Sessions into one, and store
the session data as a JSON (or similar) blob in the column value. This works best if you do
not need to concurrently update fields in the entity. So if you write the session data once,
or if you *always* only update from a single thread / process. Or if your data is designed
to be overwritten. 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 25/08/2011, at 1:15 AM, Helder Oliveira wrote:
> 
>> Thanks Indranath Ghosh for your tip!
>> 
>> I will continue here the question.
>> 
>> Aaron, i have read your suggestion and tried to design your suggestion and i have
one question regarding it.
>> 
>> Let's forget for now the Requests and Events!
>> 
>> Just keep the Visitants and the Sessions.
>> 
>> My goal is when having a visitant get all informations about him, in this case all
his past sessions so i can create a profile using his past.
>> 
>> You have suggested 3 CF's
>> 
>> Visitants CF
>> key: id
>> cn: pn
>> cv: pv
>> 
>> Visitant Sessions CF
>> key: visitant id
>> cn: session id
>> cv: none
>> 
>> Sessions CF
>> key: session id
>> cn: pn
>> cv: pv
>> 
>> When i need to know everything about one visitant, i need to query "Visitant Sessions
CF" to get all keys, and then query "Sessions CF" for all keys properties.
>> 
>> In this case, applying a Super Column Family to the "Sessions" isn't better ?
>> 
>> I mean something like:
>> 
>> {
>>   "sessions": {
>> 
>>     "visitant id 1": {
>>       "session id 1": {
>>         "p1": {"p1": "v1"},
>>         "p2": {"jira": "v2"}
>>       },
>>       "session id 2": {
>>         "p1": {"p1": "v1"}
>>       }
>>     }
>> 	
>>     "visitant id 2": {
>>       "session id 3": {
>>         "p1": {"p1": "v1"},
>>         "p2": {"jira": "v2"}
>>       },
>>       "session id 4": {
>>         "p1": {"p1": "v1"}
>>       }
>>     }
>>   }
>> }
>> 
>> Using this, i can get all sessions in the second query, instead of having all sessions
only at third query.
>> 
>> Regarding your notes, the Visitant CF will be almost unchangeable since the beginning
of his creation, the sessions will be added every time a known user visits back, ceasing a
new sessions.
>> 
>> Thanks a lot for you help guys, and i hope i was not saying crazy things :D
>> 
>> On Aug 22, 2011, at 11:23 PM, aaron morton wrote:
>> 
>>> Lets start with something quick and simple, all standard Column Families…
>>> 
>>> Visitant CF
>>> key: id 
>>> column name: property name
>>> column value: property value 
>>> 
>>> Visitant Sessions CF
>>> key: visitant id 
>>> column name: session id
>>> column value: none
>>> 
>>> Session CF
>>> 
>>> key: session_id
>>> column_name: property value 
>>> column_value: property value 
>>> 
>>> key: session_id/requests
>>> column_name: request_id
>>> column_value: none
>>> 
>>> key: session_id/events
>>> column_name: event_id
>>> column_value: none
>>> 
>>> Requests CF
>>> 
>>> key: request_id
>>> column_name: property name
>>> column_value: property value
>>> 
>>> Event CF
>>> 
>>> key: event_id
>>> column_name: property name
>>> column_value: property value
>>> 
>>> 
>>> Notes:
>>> 
>>> * assuming the Visitant CF is slowing changing i kept it in it's own cf.  
>>> * using compound keys to keep information related to sessions in the same CF.
These could be diff CF's,or in the Request or Event CF. 
>>> * the best model is the one that allows you to do your reads by getting one or
a few rows from a single cf. 
>>> * you could collapse the Request and Event CF's into one. 
>>> 
>>> If the event and request data is immutable (or there is no issues with concurrent
modifications) I would recommend this…
>>> 
>>> Request / Event CF:
>>> 
>>> key: session_id/events or session_id/requests
>>> column_name: event_id or session_id
>>> column_value: data
>>> 
>>> 
>>> Start with the simple model and then make changes to better handle your read
queries.
>>> 
>>> Have fun :)
>>> 
>>> 
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 22/08/2011, at 11:13 PM, Helder Oliveira wrote:
>>> 
>>>> Hello all,
>>>> 
>>>> i have a SQL structure like this:
>>>> 
>>>> Visitant ( has several properties )
>>>> Visitant has many Sessions
>>>> Sessions ( has several properties )
>>>> Sessions has many Requests ( has several properties )
>>>> Sessions has many Events ( has several properties )
>>>> 
>>>> 
>>>> i have read a lot and still confused how to put this on cassandra, can someone
give me a idea ?
>>> 
>> 
> 


Mime
View raw message