incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Chaitanya <bnsk1990r...@gmail.com>
Subject Re: Effect of number of keyspaces on write-throughput....
Date Mon, 19 May 2014 09:27:23 GMT
Thankyou for making these issues clear. Currently, in my datamodel, I have
the current second( seconds-from-epoch) as the row key and micro second
with the client number as the column key.
                 Hence, all the packets received during a particular second
on all the clients are stored in the same row. I did this because of the
ease in measuring the write throughput every second.
              So, according to you, this isn't a good data model since it
leads to a hot spot wherein all the packets received in a particular second
are stored to the same row? Since, the clients are all synchronized using
NTP, all the packets received by all the clients in a particular second,
are stored in the same row. I would assume now that, changing the data
model such that multiple rows are updated every second is a better idea
wherein client side logging can be used to measure the write throughput. In
this way write throughput can be increased by adding more nodes to the
cluster since now there are no hotspots as multiple rows are being updated
every second.
                    Is this true?

Thank you.
On May 19, 2014 1:19 PM, "Aaron Morton" <aaron@thelastpickle.com> wrote:

> Each client is writing to a separate keyspace simultaneously. Hence, is
> there a lot of switching of keyspaces?
>
> I would think not. If the client app is using one keyspace per connection
> there should be no reason for the driver to change keyspaces.
>
>
>
>  But, I observed that when using a single keyspace, the write throughout
> reduced slightly to 1800pkts/sec while I actually expected it to increase
> since there is no switching of contexts now. Why is this so?
>
>
> That’s a 5% change which is close enough to be ignored.
>
> I would guess that the clients are not doing anything that requires the
> driver to change the keyspace for the connection.
>
>              Can you also kindly explain how factors like using a single
> v/s multiple keyspaces, distributing write requests to a single cassandra
> node v/s multiple cassandra nodes, etc. affect the write throughput?
>
> Normally you have one keyspace per application. And the best data models
> are ones where the throughput improves as the number of nodes increases.
> This happens when there are no “hot spots” where every / most web requests
> need to read or write to a particular row.
>
> In general you can improve throughput by having more client threads
> hitting more machines. You can expect 3,000 to 4,000 non counter writes per
> code per node.
>
> Hope that helps.
> Aaron
>
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 13/05/2014, at 1:02 am, Krishna Chaitanya <bnsk1990rulz@gmail.com>
> wrote:
>
> Hello,
> Thanks for the reply. Currently, each client is writing about 470 packets
> per second where each packet is 1500 bytes. I have four clients writing
> simultaneously to the cluster. Each client is writing to a separate
> keyspace simultaneously. Hence, is there a lot of switching of keyspaces?
>
>         The total throughput is coming to around 1900 packets per second
> when using multiple keyspaces. This is because there are 4 clients and each
> one is writing around 470 pkts/sec. But, I observed that when using a
> single keyspace, the write throughout reduced slightly to 1800pkts/sec
> while I actually expected it to increase since there is no switching of
> contexts now. Why is this so?  470 packets is the maximum I can write from
> each client currently, since it is the limitation of my client program.
>                 I should also mention that these tests are being run on a
> single and double node clusters with all  the write requests going only to
> a single cassandra server.
>
>              Can you also kindly explain how factors like using a single
> v/s multiple keyspaces, distributing write requests to a single cassandra
> node v/s multiple cassandra nodes, etc. affect the write throughput?  Are
> there any other factors that affect write throughput other than these?
> Because, a single cassandra node seems to be able to handle all these write
> requests as I am not able to see any significant improvement by
> distributing write requests among multiple nodes.
>
> Thanking you.
>
> On May 12, 2014 2:39 PM, "Aaron Morton" <aaron@thelastpickle.com> wrote:
>
>> On the homepage of libQtCassandra, its mentioned that switching between
>> keyspaces is costly when storing into Cassandra thereby affecting the write
>> throughput. Is this necessarily true for other libraries like pycassa and
>> hector as well?
>>
>> When using the thrift connection the keyspace is a part of the connection
>> state, so changing keyspaces requires a round trip to the server. Not
>> hugely expensive, but it adds up if you do it a lot.
>>
>>                 Can I increase the write throughput by configuring all
>> the clients to store in a single keyspace instead of multiple keyspaces to
>> increase the write throughput?
>>
>> You should expect to get 3,000 to 4,000 writes per core per node.
>>
>> What are you getting now?
>>
>> Cheers
>> A
>>
>>     -----------------
>> Aaron Morton
>> New Zealand
>> @aaronmorton
>>
>> Co-Founder & Principal Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On 11/05/2014, at 4:06 pm, Krishna Chaitanya <bnsk1990rulz@gmail.com>
>> wrote:
>>
>> Hello,
>> I have an application that writes network packets to a Cassandra cluster
>> from a number of client nodes. It uses the libQtCassandra library to access
>> Cassandra. On the homepage of libQtCassandra, its mentioned that switching
>> between keyspaces is costly when storing into Cassandra thereby affecting
>> the write throughput. Is this necessarily true for other libraries like
>> pycassa and hector as well?
>>                 Can I increase the write throughput by configuring all
>> the clients to store in a single keyspace instead of multiple keyspaces to
>> increase the write throughput?
>>
>> Thankyou.
>>
>>
>>
>

Mime
View raw message