incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: How to increase cassandra's performance in read?
Date Tue, 20 Apr 2010 16:16:18 GMT
Not all the data associated w/ the key is brought into memory, just
all the data associated w/ the supercolumns being queried.

Supercolumns are so you can update a smallish number of subcolumns
independently (e.g. when denormalizing an entire narrow row, usually
with a finite set of columns).  If you want lots of subcolumns you
need to turn that supercolumn into a new row.

On Tue, Apr 20, 2010 at 11:08 AM, Mark Jones <MJones@imagehawk.com> wrote:
> When I first read this, it bothered me because it seemed like it couldn't be so.  So
I read the link, and it says the whole thing, so I have to ask for some classification here.
>
> I had always assumed a super column was similar to a local keyspace, and that the SubColumns
under it were similar to keys, that way you could localize the data for a user or a website.
>
> So Keyspace:Email
>  Key:UserID
>     SuperColumn Entries:
>                Individual Email 1:  Columns {body, header, tags, recipients,
flags, whatever}                  Individual Email 2:  Columns {body, header, tags,
recipients, flags, whatever}                  Individual Email 3:  Columns {body,
header, tags, recipients, flags, whatever}
>
> I think now this is probably the wrong concept.
>
> It is really more like:
>        Primary Key: Name:Value pairs
>
> And with Supercolumns, the Value part can be another Hash:
>        Primary Key: Name: {Name:Value pairs} pairs
>
> But when I lookup by Primary Key, ALL of the data associated with the key will be brought
into memory!  So, when if I wanted to display the inbox of a user with several years of email,
it would be one HUGE read to suck his entire inbox into memory to get down to the point I
could display one message.
>
> Is this more correct?
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Tuesday, April 20, 2010 10:47 AM
> To: user@cassandra.apache.org
> Subject: Re: How to increase cassandra's performance in read?
>
> How many columns are in the supercolumn total?
>
> "in super columnfamilies there is a third level of subcolumns; these
> are not indexed, and any request for a subcolumn deserializes _all_
> the subcolumns in that supercolumn"
>
> http://wiki.apache.org/cassandra/CassandraLimitations
>
> On Tue, Apr 20, 2010 at 9:50 AM, Mark Jones <MJones@imagehawk.com> wrote:
>> I too am seeing very slow performance while testing worst case scenarios of
>> 1 key leading to 1 supercolumn and 1 column beyond that.
>>
>>
>>
>> Key -> SuperColumn -> 1 Column (of ~ 500 bytes)
>>
>>
>>
>> Drive utilization is 80-90% and I'm only dealing with 50-70 million rows.
>> (With NO swapping)  So far, I've found nothing that helps, including
>> increasing the keycache FROM 200k-500k keys, I'm guessing the hashing
>> prevents better cache performance.
>>
>>
>>
>> Read performance is definitely not 3 IOs based on the utilization factors on
>> my drives.  I'm not sure the issue was ever settled in the previous e-mails
>> as to how to calculate how many IOs were being done for each read.  I've
>> been testing with clusters of 1,2,3 or 4 machines and so far all I'm seeing
>> with multiple machines, is lower performance in a cluster than alone.  I
>> keep assuming that at some number of nodes, the performance will begin to
>> pick up.  Three of my nodes are running with 8GB (6GB Java Heap), and one
>> has 4GB (3GB Java Heap).  The machine with the smallest memory footprint is
>> the fastest performer on inserts, but definitely not the fastest on reads.
>>
>>
>>
>> I'm suspecting the read path is relying heavily on the fact that you want to
>> get many columns that are closely related, because lookup by key appears to
>> be incredibly slow.
>>
>>
>>
>> From: yangfeng [mailto:yeahyf@gmail.com]
>> Sent: Tuesday, April 20, 2010 7:59 AM
>> To: user@cassandra.apache.org; dev@cassandra.apache.org
>> Subject: How to increase cassandra's performance in read?
>>
>>
>>
>> I  get 10 columns Family by keys and  one columns Family has 30 columns.
>>
>> I use multigetSlice once to get 10 column Family.but the performance is so
>> poor.
>>
>> anyone has other  thought to increase the performance.
>>
>>
>

Mime
View raw message