incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From T Akhayo <t.akh...@gmail.com>
Subject Re: Two column families or One super column family?
Date Thu, 31 Mar 2011 07:52:38 GMT
Hi Aaron,

Thank you for your reply, i appreciate the suggestions you made.

Yesterday i managed to get everything (our main read) in one CF, with the
use of a structure in a value like you suggested.

Designing a new data model is different from what i'm used to, but if you
keep in mind that you designing for performance instead of flexibility then
everything gets a bit easier.

Kind regards,
T. Akhayo

2011/3/30 aaron morton <aaron@thelastpickle.com>

> I would go with the solution that means you only have to make one request
> to serve your reads, so consider the super CF approach.
>
> There are some downsides to super columns see
> http://wiki.apache.org/cassandra/CassandraLimitations and they tend to
> have a love-them-hate-them reputation.
>
> One thing to consider is that you do not need to model every attribute of
> your entity as a column in cassandra. Especially if you are always going to
> pull back all the attributes. So you could do your super CF approach with a
> standard CF, just pack the columns into some sort of structure such as JSON
> and store them as a blob.
>
> Or you can use a naming scheme in the column names with a standard CF, e.g.
> uuid1.text and uuid2.text
>
> Hope that helps.
> Aaron
>
> On 30 Mar 2011, at 01:05, T Akhayo wrote:
>
> Good afternoon,
>
> I'm making my data model from scratch for cassandra, this means i can tune
> and fine tune it for performance.
>
> At this time i'm having problems choosing between a 2 column families or 1
> super column family. I will illustrate with a example.
>
> Sector, this defines a place, this is one or two properties.
> Entry, a entry that is bound to a sector, this is simply some text and a
> few properties.
>
> I can model this with a super column family:
>
> sectors{ //super column family
> sector1{
> uid1{
> text: a text
> user: joop
> }
> uid2{
> text: more text
> user: piet
> }
> }
> sector2{
> uid10{
> text: even more text
> user: marie
> }
> }
> }
>
> But i can also model this with 2 column families:
>
> sectors{ // column family
> sector1{
> textid1: null
> textid2: null
> }
> sector2{
> textid4: null
> }
> }
>
> texts{ //column family
> textid1{
> text: a text
> user: joop
> }
> textid2{
> text: more text
> user: piet
> }
> }
>
> With the super column family i can retrieve a list of texts for a specific
> sector with only 1 request to cassandra.
>
> With the 2 column families i need to send 2 requests to cassandra:
> 1. give me all textids from sector x. (returns x, y, z)
> 2. give me all texts that have id x, y, z.
>
> In my final application it is likely that there will be a bit more writes
> compared to reads.
>
> I was wondering what the best approach is when it comes to performance. I
> suspect that using super column families is slower compared the using column
> families, but is it stil slower when using 2 column families and with 2
> request to cassandra instead of 1 (with super column family).
>
> Kind regards,
> T. Akhayo
>
>
>

Mime
View raw message