incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Two column families or One super column family?
Date Thu, 31 Mar 2011 15:10:17 GMT
On Thu, Mar 31, 2011 at 3:52 AM, T Akhayo <t.akhayo@gmail.com> wrote:
> Hi Aaron,
>
> Thank you for your reply, i appreciate the suggestions you made.
>
> Yesterday i managed to get everything (our main read) in one CF, with the
> use of a structure in a value like you suggested.
>
> Designing a new data model is different from what i'm used to, but if you
> keep in mind that you designing for performance instead of flexibility then
> everything gets a bit easier.
>
> Kind regards,
> T. Akhayo
>
> 2011/3/30 aaron morton <aaron@thelastpickle.com>
>>
>> I would go with the solution that means you only have to make one request
>> to serve your reads, so consider the super CF approach.
>> There are some downsides to super columns
>> see http://wiki.apache.org/cassandra/CassandraLimitations and they tend to
>> have a love-them-hate-them reputation.
>> One thing to consider is that you do not need to model every attribute of
>> your entity as a column in cassandra. Especially if you are always going to
>> pull back all the attributes. So you could do your super CF approach with a
>> standard CF, just pack the columns into some sort of structure such as JSON
>> and store them as a blob.
>> Or you can use a naming scheme in the column names with a standard CF,
>> e.g. uuid1.text and uuid2.text
>> Hope that helps.
>> Aaron
>> On 30 Mar 2011, at 01:05, T Akhayo wrote:
>>
>> Good afternoon,
>>
>> I'm making my data model from scratch for cassandra, this means i can tune
>> and fine tune it for performance.
>>
>> At this time i'm having problems choosing between a 2 column families or 1
>> super column family. I will illustrate with a example.
>>
>> Sector, this defines a place, this is one or two properties.
>> Entry, a entry that is bound to a sector, this is simply some text and a
>> few properties.
>>
>> I can model this with a super column family:
>>
>> sectors{ //super column family
>> sector1{
>> uid1{
>> text: a text
>> user: joop
>> }
>> uid2{
>> text: more text
>> user: piet
>> }
>> }
>> sector2{
>> uid10{
>> text: even more text
>> user: marie
>> }
>> }
>> }
>>
>> But i can also model this with 2 column families:
>>
>> sectors{ // column family
>> sector1{
>> textid1: null
>> textid2: null
>> }
>> sector2{
>> textid4: null
>> }
>> }
>>
>> texts{ //column family
>> textid1{
>> text: a text
>> user: joop
>> }
>> textid2{
>> text: more text
>> user: piet
>> }
>> }
>>
>> With the super column family i can retrieve a list of texts for a specific
>> sector with only 1 request to cassandra.
>>
>> With the 2 column families i need to send 2 requests to cassandra:
>> 1. give me all textids from sector x. (returns x, y, z)
>> 2. give me all texts that have id x, y, z.
>>
>> In my final application it is likely that there will be a bit more writes
>> compared to reads.
>>
>> I was wondering what the best approach is when it comes to performance. I
>> suspect that using super column families is slower compared the using column
>> families, but is it stil slower when using 2 column families and with 2
>> request to cassandra instead of 1 (with super column family).
>>
>> Kind regards,
>> T. Akhayo
>>
>
>

I decided to write this as a general guide to the topic of
denormalizing things into multiple CF's or not.
  	 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/whytf_would_i_need_with

Mime
View raw message