Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D5F981115B for ; Mon, 19 May 2014 07:42:45 +0000 (UTC) Received: (qmail 43619 invoked by uid 500); 19 May 2014 07:42:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 43580 invoked by uid 500); 19 May 2014 07:42:43 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 43567 invoked by uid 99); 19 May 2014 07:42:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 07:42:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.220.42] (HELO mail-pa0-f42.google.com) (209.85.220.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 07:42:38 +0000 Received: by mail-pa0-f42.google.com with SMTP id rd3so5465614pab.1 for ; Mon, 19 May 2014 00:42:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=4j4MeHWWZ2XjB/IH6ClXq1l4GndCKRB+/BGwHoyR3jE=; b=JRZeVmbMEHkEEWywft358YMsPh2POv9xe8Qv8GvvGXgfXER7kIacLU4Xmp+r/SSKoF 7zPM+nd2IpnV5tssEu7mE7sOUznv6RC7j/6Zmue1y8BxXbqsnTB05jhEDVrZSZh7Hqso OUPx/c/KNdg3J67EUwSHI0AdP3GKh8lKLWLqeOeabtxGIJponO4OE0vnc2A1ZdW8wc+l E4jWL+YFQdYdtxvNljM15phdQ88tFIgMzJRnWBqhsCbxZnNMLnOiqzE2FfKPqJdi7uWR fPgmveDLM15zAF9TjUqj6KekzjkJlabTgRW/FxF8k/jwzObQ6il6ZEuE0ha2sZ31fUoE DeLg== X-Gm-Message-State: ALoCoQkBt2ngNPOtgPwd6sxBeLJK5zbrU15QU7qopKOnTZzduMve2o0MZ5dFQPE4wiDAktcmVujg X-Received: by 10.66.188.5 with SMTP id fw5mr39983153pac.63.1400485338181; Mon, 19 May 2014 00:42:18 -0700 (PDT) Received: from [172.16.1.9] ([203.86.207.101]) by mx.google.com with ESMTPSA id nf5sm28452029pbc.77.2014.05.19.00.42.14 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 19 May 2014 00:42:17 -0700 (PDT) From: Aaron Morton Content-Type: multipart/alternative; boundary="Apple-Mail=_702F6B6C-FD7E-4308-914A-E0E5A0BAC9BC" Message-Id: <24657D26-AC85-4FB1-A6D1-C7BA1A9FA3FA@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: Effect of number of keyspaces on write-throughput.... Date: Mon, 19 May 2014 19:42:10 +1200 References: To: Cassandra User In-Reply-To: X-Mailer: Apple Mail (2.1874) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_702F6B6C-FD7E-4308-914A-E0E5A0BAC9BC Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > Each client is writing to a separate keyspace simultaneously. Hence, = is there a lot of switching of keyspaces? >=20 >=20 I would think not. If the client app is using one keyspace per = connection there should be no reason for the driver to change keyspaces.=20= =20 > But, I observed that when using a single keyspace, the write = throughout reduced slightly to 1800pkts/sec while I actually expected it = to increase since there is no switching of contexts now. Why is this so?=20= >=20 >=20 That=92s a 5% change which is close enough to be ignored.=20 I would guess that the clients are not doing anything that requires the = driver to change the keyspace for the connection.=20 > Can you also kindly explain how factors like using a = single v/s multiple keyspaces, distributing write requests to a single = cassandra node v/s multiple cassandra nodes, etc. affect the write = throughput?=20 >=20 >=20 Normally you have one keyspace per application. And the best data models = are ones where the throughput improves as the number of nodes increases. = This happens when there are no =93hot spots=94 where every / most web = requests need to read or write to a particular row.=20 In general you can improve throughput by having more client threads = hitting more machines. You can expect 3,000 to 4,000 non counter writes = per code per node.=20 Hope that helps.=20 Aaron ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 13/05/2014, at 1:02 am, Krishna Chaitanya = wrote: > Hello, > Thanks for the reply. Currently, each client is writing about 470 = packets per second where each packet is 1500 bytes. I have four clients = writing simultaneously to the cluster. Each client is writing to a = separate keyspace simultaneously. Hence, is there a lot of switching of = keyspaces? >=20 > The total throughput is coming to around 1900 packets per = second when using multiple keyspaces. This is because there are 4 = clients and each one is writing around 470 pkts/sec. But, I observed = that when using a single keyspace, the write throughout reduced slightly = to 1800pkts/sec while I actually expected it to increase since there is = no switching of contexts now. Why is this so? 470 packets is the = maximum I can write from each client currently, since it is the = limitation of my client program. > I should also mention that these tests are being run = on a single and double node clusters with all the write requests going = only to a single cassandra server. >=20 > Can you also kindly explain how factors like using a = single v/s multiple keyspaces, distributing write requests to a single = cassandra node v/s multiple cassandra nodes, etc. affect the write = throughput? Are there any other factors that affect write throughput = other than these? Because, a single cassandra node seems to be able to = handle all these write requests as I am not able to see any significant = improvement by distributing write requests among multiple nodes. >=20 > Thanking you. > =20 >=20 > On May 12, 2014 2:39 PM, "Aaron Morton" = wrote: >> On the homepage of libQtCassandra, its mentioned that switching = between keyspaces is costly when storing into Cassandra thereby = affecting the write throughput. Is this necessarily true for other = libraries like pycassa and hector as well? >>=20 >>=20 > When using the thrift connection the keyspace is a part of the = connection state, so changing keyspaces requires a round trip to the = server. Not hugely expensive, but it adds up if you do it a lot.=20 >=20 >> Can I increase the write throughput by configuring = all the clients to store in a single keyspace instead of multiple = keyspaces to increase the write throughput? >>=20 >>=20 > You should expect to get 3,000 to 4,000 writes per core per node.=20 >=20 > What are you getting now? >=20 > Cheers > A >=20 > ----------------- > Aaron Morton > New Zealand > @aaronmorton >=20 > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com >=20 > On 11/05/2014, at 4:06 pm, Krishna Chaitanya = wrote: >=20 >> Hello, >> I have an application that writes network packets to a Cassandra = cluster from a number of client nodes. It uses the libQtCassandra = library to access Cassandra. On the homepage of libQtCassandra, its = mentioned that switching between keyspaces is costly when storing into = Cassandra thereby affecting the write throughput. Is this necessarily = true for other libraries like pycassa and hector as well? >> Can I increase the write throughput by configuring = all the clients to store in a single keyspace instead of multiple = keyspaces to increase the write throughput? >>=20 >> Thankyou. >>=20 >=20 --Apple-Mail=_702F6B6C-FD7E-4308-914A-E0E5A0BAC9BC Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252

Each client is writing = to a separate keyspace simultaneously. Hence, is there a lot of = switching of keyspaces?


I would think = not. If the client app is using one keyspace per connection there should = be no reason for the driver to change = keyspaces. 

 

 But, I observed that when using a single = keyspace, the write throughout reduced slightly to 1800pkts/sec while I = actually expected it to increase since there is no switching of contexts = now. Why is this = so? 



That=92s a 5% = change which is close enough to be = ignored. 

I would guess that the clients = are not doing anything that requires the driver to change the keyspace = for the connection. 

             Can you = also kindly explain how factors like using a single v/s multiple = keyspaces, distributing write requests to a single cassandra node v/s = multiple cassandra nodes, etc. affect the write = throughput? 


Normally you have one = keyspace per application. And the best data models are ones where the = throughput improves as the number of nodes increases. This happens when = there are no =93hot spots=94 where every / most web requests need to = read or write to a particular row. 

In = general you can improve throughput by having more client threads hitting = more machines. You can expect 3,000 to 4,000 non counter writes per code = per node. 

Hope that = helps. 
Aaron

-----------------
Aaron = Morton
New = Zealand
@aaronmorton

Co-Founder & = Principal Consultant
Apache Cassandra Consulting
<= /div>

On 13/05/2014, at 1:02 am, Krishna Chaitanya <bnsk1990rulz@gmail.com> = wrote:

Hello,
Thanks for the reply. Currently, each client is writing about 470 = packets per second where each packet is 1500 bytes. I have four clients = writing simultaneously to the cluster. Each client is writing to a = separate keyspace simultaneously. Hence, is there a lot of switching of = keyspaces?

        The total = throughput is coming to around 1900 packets per second when using = multiple keyspaces. This is because there are 4 clients and each one is = writing around 470 pkts/sec. But, I observed that when using a single = keyspace, the write throughout reduced slightly to 1800pkts/sec while I = actually expected it to increase since there is no switching of contexts = now. Why is this so?  470 packets is the maximum I can write from = each client currently, since it is the limitation of my client = program.
=             &n= bsp;   I should also mention that these tests are being run on = a single and double node clusters with all  the write requests = going only to a single cassandra = server.

          =    Can you also kindly explain how factors like using a single = v/s multiple keyspaces, distributing write requests to a single = cassandra node v/s multiple cassandra nodes, etc. affect the write = throughput?  Are there any other factors that affect write = throughput other than these?  Because, a single cassandra node = seems to be able to handle all these write requests as I am not able to = see any significant improvement by distributing write requests among = multiple nodes.

Thanking you.
=             &n= bsp;       

On May 12, 2014 2:39 PM, "Aaron Morton" = <aaron@thelastpickle.com> = wrote:

On the = homepage of libQtCassandra, its mentioned that switching between = keyspaces is costly when storing into Cassandra thereby affecting the = write throughput. Is this necessarily true for other libraries like = pycassa and hector as well?


When using the thrift connection the = keyspace is a part of the connection state, so changing keyspaces = requires a round trip to the server. Not hugely expensive, but it adds = up if you do it a lot. 

        =         Can I increase the write throughput by = configuring all the clients to store in a single keyspace instead of = multiple keyspaces to increase the write throughput?


You should expect to get 3,000 to 4,000 = writes per core per node. 

What are you = getting = now?

Cheers
A

=
-----------------
Aaron Morton
New = Zealand
@aaronmorton

Co-Founder & = Principal Consultant
Apache Cassandra Consulting
=

On 11/05/2014, at 4:06 pm, Krishna Chaitanya <bnsk1990rulz@gmail.com> = wrote:

Hello,
I have an application that writes network packets to a Cassandra = cluster from a number of client nodes. It uses the libQtCassandra = library to access Cassandra. On the homepage of libQtCassandra, its = mentioned that switching between keyspaces is costly when storing into = Cassandra thereby affecting the write throughput. Is this necessarily = true for other libraries like pycassa and hector as well?
=             &n= bsp;   Can I increase the write throughput by configuring all = the clients to store in a single keyspace instead of multiple keyspaces = to increase the write throughput?

Thankyou.



= --Apple-Mail=_702F6B6C-FD7E-4308-914A-E0E5A0BAC9BC--