cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Jones <>
Subject RE: How to increase cassandra's performance in read?
Date Tue, 20 Apr 2010 16:08:08 GMT
When I first read this, it bothered me because it seemed like it couldn't be so.  So I read
the link, and it says the whole thing, so I have to ask for some classification here.

I had always assumed a super column was similar to a local keyspace, and that the SubColumns
under it were similar to keys, that way you could localize the data for a user or a website.

So Keyspace:Email
     SuperColumn Entries:
                Individual Email 1:  Columns {body, header, tags, recipients, flags, whatever}
                 Individual Email 2:  Columns {body, header, tags, recipients, flags, whatever}
                 Individual Email 3:  Columns {body, header, tags, recipients, flags, whatever}

I think now this is probably the wrong concept.

It is really more like:
        Primary Key: Name:Value pairs

And with Supercolumns, the Value part can be another Hash:
        Primary Key: Name: {Name:Value pairs} pairs

But when I lookup by Primary Key, ALL of the data associated with the key will be brought
into memory!  So, when if I wanted to display the inbox of a user with several years of email,
it would be one HUGE read to suck his entire inbox into memory to get down to the point I
could display one message.

Is this more correct?

-----Original Message-----
From: Jonathan Ellis []
Sent: Tuesday, April 20, 2010 10:47 AM
Subject: Re: How to increase cassandra's performance in read?

How many columns are in the supercolumn total?

"in super columnfamilies there is a third level of subcolumns; these
are not indexed, and any request for a subcolumn deserializes _all_
the subcolumns in that supercolumn"

On Tue, Apr 20, 2010 at 9:50 AM, Mark Jones <> wrote:
> I too am seeing very slow performance while testing worst case scenarios of
> 1 key leading to 1 supercolumn and 1 column beyond that.
> Key -> SuperColumn -> 1 Column (of ~ 500 bytes)
> Drive utilization is 80-90% and I'm only dealing with 50-70 million rows.
> (With NO swapping)  So far, I've found nothing that helps, including
> increasing the keycache FROM 200k-500k keys, I'm guessing the hashing
> prevents better cache performance.
> Read performance is definitely not 3 IOs based on the utilization factors on
> my drives.  I'm not sure the issue was ever settled in the previous e-mails
> as to how to calculate how many IOs were being done for each read.  I've
> been testing with clusters of 1,2,3 or 4 machines and so far all I'm seeing
> with multiple machines, is lower performance in a cluster than alone.  I
> keep assuming that at some number of nodes, the performance will begin to
> pick up.  Three of my nodes are running with 8GB (6GB Java Heap), and one
> has 4GB (3GB Java Heap).  The machine with the smallest memory footprint is
> the fastest performer on inserts, but definitely not the fastest on reads.
> I'm suspecting the read path is relying heavily on the fact that you want to
> get many columns that are closely related, because lookup by key appears to
> be incredibly slow.
> From: yangfeng []
> Sent: Tuesday, April 20, 2010 7:59 AM
> To:;
> Subject: How to increase cassandra's performance in read?
> I  get 10 columns Family by keys and  one columns Family has 30 columns.
> I use multigetSlice once to get 10 column Family.but the performance is so
> poor.
> anyone has other  thought to increase the performance.

View raw message