incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: help creating data model
Date Thu, 25 Aug 2011 22:09:07 GMT
> later i will need to apply filter to that data, 
Sounds like a read query you should support by denormalising the data. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2011, at 10:50 PM, Helder Oliveira wrote:

> Hello,
> 
> thanks for your time.
> 
> I have suggested a SCF but i am still testing the system with CF, making some tests and
testing the data flow ( insert / select ).
> 
> Making subdata as JSON already came into my mind, but it's not possible because later
i will need to apply filter to that data, and if it is in JSON i need to fetch all and filter
on the programming side. Correct me if i am wrong.
> 
> Well i will continue the tests with CF, things are getting more clear for me now.
> 
> Thanks a lot guys for answer and spending time with some newbie questions :)
> 
> 
> On Aug 24, 2011, at 11:27 PM, aaron morton wrote:
> 
>> I normally suggest trying a model with Standard CF's first as there are some down
sides to super CF's. If you know there will only be a few sub columns there are probably OK
(see http://wiki.apache.org/cassandra/CassandraLimitations). Your alternative design is fine.
Test it out and see what works for you. 
>> 
>> Also (and I know not everyone agrees) depending on the use case it's ok to blob data
up. Cassandra does not *need* to know about the individual properties of your entities. By
that I mean there is not a query planner that can make better decisions about how to execute
your query based on data types and distributions, or how what types columns should have in
projections. 
>> 
>> So an alternative here is to collapse VisitantSessions and Sessions into one, and
store the session data as a JSON (or similar) blob in the column value. This works best if
you do not need to concurrently update fields in the entity. So if you write the session data
once, or if you *always* only update from a single thread / process. Or if your data is designed
to be overwritten. 
>> 
>> Cheers
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 25/08/2011, at 1:15 AM, Helder Oliveira wrote:
>> 
>>> Thanks Indranath Ghosh for your tip!
>>> 
>>> I will continue here the question.
>>> 
>>> Aaron, i have read your suggestion and tried to design your suggestion and i
have one question regarding it.
>>> 
>>> Let's forget for now the Requests and Events!
>>> 
>>> Just keep the Visitants and the Sessions.
>>> 
>>> My goal is when having a visitant get all informations about him, in this case
all his past sessions so i can create a profile using his past.
>>> 
>>> You have suggested 3 CF's
>>> 
>>> Visitants CF
>>> key: id
>>> cn: pn
>>> cv: pv
>>> 
>>> Visitant Sessions CF
>>> key: visitant id
>>> cn: session id
>>> cv: none
>>> 
>>> Sessions CF
>>> key: session id
>>> cn: pn
>>> cv: pv
>>> 
>>> When i need to know everything about one visitant, i need to query "Visitant
Sessions CF" to get all keys, and then query "Sessions CF" for all keys properties.
>>> 
>>> In this case, applying a Super Column Family to the "Sessions" isn't better ?
>>> 
>>> I mean something like:
>>> 
>>> {
>>>   "sessions": {
>>> 
>>>     "visitant id 1": {
>>>       "session id 1": {
>>>         "p1": {"p1": "v1"},
>>>         "p2": {"jira": "v2"}
>>>       },
>>>       "session id 2": {
>>>         "p1": {"p1": "v1"}
>>>       }
>>>     }
>>> 	
>>>     "visitant id 2": {
>>>       "session id 3": {
>>>         "p1": {"p1": "v1"},
>>>         "p2": {"jira": "v2"}
>>>       },
>>>       "session id 4": {
>>>         "p1": {"p1": "v1"}
>>>       }
>>>     }
>>>   }
>>> }
>>> 
>>> Using this, i can get all sessions in the second query, instead of having all
sessions only at third query.
>>> 
>>> Regarding your notes, the Visitant CF will be almost unchangeable since the beginning
of his creation, the sessions will be added every time a known user visits back, ceasing a
new sessions.
>>> 
>>> Thanks a lot for you help guys, and i hope i was not saying crazy things :D
>>> 
>>> On Aug 22, 2011, at 11:23 PM, aaron morton wrote:
>>> 
>>>> Lets start with something quick and simple, all standard Column Families…
>>>> 
>>>> Visitant CF
>>>> key: id 
>>>> column name: property name
>>>> column value: property value 
>>>> 
>>>> Visitant Sessions CF
>>>> key: visitant id 
>>>> column name: session id
>>>> column value: none
>>>> 
>>>> Session CF
>>>> 
>>>> key: session_id
>>>> column_name: property value 
>>>> column_value: property value 
>>>> 
>>>> key: session_id/requests
>>>> column_name: request_id
>>>> column_value: none
>>>> 
>>>> key: session_id/events
>>>> column_name: event_id
>>>> column_value: none
>>>> 
>>>> Requests CF
>>>> 
>>>> key: request_id
>>>> column_name: property name
>>>> column_value: property value
>>>> 
>>>> Event CF
>>>> 
>>>> key: event_id
>>>> column_name: property name
>>>> column_value: property value
>>>> 
>>>> 
>>>> Notes:
>>>> 
>>>> * assuming the Visitant CF is slowing changing i kept it in it's own cf.
 
>>>> * using compound keys to keep information related to sessions in the same
CF. These could be diff CF's,or in the Request or Event CF. 
>>>> * the best model is the one that allows you to do your reads by getting one
or a few rows from a single cf. 
>>>> * you could collapse the Request and Event CF's into one. 
>>>> 
>>>> If the event and request data is immutable (or there is no issues with concurrent
modifications) I would recommend this…
>>>> 
>>>> Request / Event CF:
>>>> 
>>>> key: session_id/events or session_id/requests
>>>> column_name: event_id or session_id
>>>> column_value: data
>>>> 
>>>> 
>>>> Start with the simple model and then make changes to better handle your read
queries.
>>>> 
>>>> Have fun :)
>>>> 
>>>> 
>>>> 
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>> 
>>>> On 22/08/2011, at 11:13 PM, Helder Oliveira wrote:
>>>> 
>>>>> Hello all,
>>>>> 
>>>>> i have a SQL structure like this:
>>>>> 
>>>>> Visitant ( has several properties )
>>>>> Visitant has many Sessions
>>>>> Sessions ( has several properties )
>>>>> Sessions has many Requests ( has several properties )
>>>>> Sessions has many Events ( has several properties )
>>>>> 
>>>>> 
>>>>> i have read a lot and still confused how to put this on cassandra, can
someone give me a idea ?
>>>> 
>>> 
>> 
> 


Mime
View raw message