From user-return-37055-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Oct 10 14:04:09 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6690310500 for ; Thu, 10 Oct 2013 14:04:09 +0000 (UTC) Received: (qmail 4481 invoked by uid 500); 10 Oct 2013 14:04:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 4269 invoked by uid 500); 10 Oct 2013 14:04:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 4256 invoked by uid 99); 10 Oct 2013 14:03:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 14:03:58 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of artur.kronenberg@openmarket.com designates 81.187.36.3 as permitted sender) Received: from [81.187.36.3] (HELO puma.mxtelecom.com) (81.187.36.3) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 14:03:54 +0000 Received: from glide.lon.openmarket.com ([10.9.64.115]) by puma.mxtelecom.com with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.80.1) (envelope-from ) id 1VUGpr-0001ni-RB for user@cassandra.apache.org; Thu, 10 Oct 2013 15:03:27 +0100 Message-ID: <5256B3AE.9090608@openmarket.com> Date: Thu, 10 Oct 2013 15:03:26 +0100 From: Artur Kronenberg User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.0 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Rowcache and quorum reads cassandra References: <52568418.3010304@openmarket.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------070802090805020700060102" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------070802090805020700060102 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi. That is basically our set up. We'll be holding all data on all nodes. My problem was more on how the cache would behave. I thought it might go this way: 1. No cache hit Read from 3 nodes to verify results are correct and then return. Write result into RowCache. 2. Cache hit Read from Cache directly and return. If now the value gets updated it would be found in the RowCache and either invalidated (hence case 1 on next read) or updated (hence case 2 on next read). However I couldn't find any information on this. If this was the case it would mean that each node would only have to hold 1/5 of my data in Cache (you're right about the DC clone so 1/5 of data instead of 1/10). If however 3 nodes have to be read each time and all 3 fill up the row cache with the same data that would make my cache requirements bigger. Thanks! Artur On 10/10/13 14:06, Ken Hancock wrote: > If you're hitting 3/5 nodes, it sounds like you've set your > replication factor to 5. Is that what you're doing so you can have a > 2-node outtage? > > For a 5-node cluster, RF=5, each node will have 100% of your data (a > second DC is just a clone), so with a 3GB off-heap it means that 3GB / > total would be cacheable in the row cache. > > On the other hand, if you're doing RF=3, each node will have 60% of > your data instead of 100% so the effective percentage of rows that are > cache goes up by 66%. > > Great quick & dirty caclulator: http://www.ecyrd.com/cassandracalculator/ > > > > On Thu, Oct 10, 2013 at 6:40 AM, Artur Kronenberg > > wrote: > > I was reading through configuration tips for cassandra and decided > to use row-cache in order to optimize the read performance on my > cluster. > > I have a cluster of 10 nodes, each of them opeartion with 3 GB > off-heap using cassandra 2.4.1. I am doing local quorum reads, > which means that I will hit 3 nodes out of 5 because I split my 10 > nodes into two data-centres. > > I was under the impression that since each node gets a certain > range of reads my total amount of off-heap would be 10 * 3 GB = 30 > GB. However is this still correct with quorum reads? How does > cassandra handle row-cache hits in combination with quorum reads? > > Thanks! > -- artur > > > > > -- > **Ken Hancock **| System Architect, Advanced Advertising > SeaChange International > 50 Nagog Park > Acton, Massachusetts 01720 > ken.hancock@schange.com | > www.schange.com | NASDAQ:SEAC > > Office: +1 (978) 889-3329 | Google Talk: ken.hancock@schange.com > | Skype:hancockks | Yahoo > IM:hancockks LinkedIn > > SeaChange International > > This e-mail and any attachments may contain information which is > SeaChange International confidential. The information enclosed is > intended only for the addressees herein and may not be copied or > forwarded without permission from SeaChange International. > --------------070802090805020700060102 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit
Hi.

That is basically our set up. We'll be holding all data on all nodes.

My problem was more on how the cache would behave. I thought it might go this way:

1. No cache hit

Read from 3 nodes to verify results are correct and then return. Write result into RowCache.

2. Cache hit

Read from Cache directly and return.

If now the value gets updated it would be found in the RowCache and either invalidated (hence case 1 on next read) or updated (hence case 2 on next read). However I couldn't find any information on this.

If this was the case it would mean that each node would only have to hold 1/5 of my data in Cache (you're right about the DC clone so 1/5 of data instead of 1/10). If however 3 nodes have to be read each time and all 3 fill up the row cache with the same data that would make my cache requirements bigger.

Thanks!

Artur

On 10/10/13 14:06, Ken Hancock wrote:
If you're hitting 3/5 nodes, it sounds like you've set your replication factor to 5. Is that what you're doing so you can have a 2-node outtage?

For a 5-node cluster, RF=5, each node will have 100% of your data (a second DC is just a clone), so with a 3GB off-heap it means that 3GB / <total data size in GB> total would be cacheable in the row cache.

On the other hand, if you're doing RF=3, each node will have 60% of your data instead of 100% so the effective percentage of rows that are cache goes up by 66%.

Great quick & dirty caclulator: http://www.ecyrd.com/cassandracalculator/



On Thu, Oct 10, 2013 at 6:40 AM, Artur Kronenberg <artur.kronenberg@openmarket.com> wrote:

I was reading through configuration tips for cassandra and decided to use row-cache in order to optimize the read performance on my cluster.

I have a cluster of 10 nodes, each of them opeartion with 3 GB off-heap using cassandra 2.4.1. I am doing local quorum reads, which means that I will hit 3 nodes out of 5 because I split my 10 nodes into two data-centres.

I was under the impression that since each node gets a certain range of reads my total amount of off-heap would be 10 * 3 GB = 30 GB. However is this still correct with quorum reads? How does cassandra handle row-cache hits in combination with quorum reads?

Thanks!
-- artur




--
Ken Hancock | System Architect, Advanced Advertising 
SeaChange International 
50 Nagog Park
Acton, Massachusetts 01720
ken.hancock@schange.com | www.schange.com | NASDAQ:SEAC 
Office: +1 (978) 889-3329 | Google Talk: ken.hancock@schange.com | Skype:hancockks | Yahoo IM:hancockks
LinkedIn

SeaChange International
This e-mail and any attachments may contain information which is SeaChange International confidential. The information enclosed is intended only for the addressees herein and may not be copied or forwarded without permission from SeaChange International.

--------------070802090805020700060102--