cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <...@jonhaddad.com>
Subject Re: Cassandra Performance on a Single Machine
Date Thu, 14 Jan 2016 21:02:58 GMT
I think you actually get a really useful metric by benchmarking 1 machine.
You understand your cluster's theoretical maximum performance, which would
be Nodes * number of queries.  Yes, adding in replication and CL is
important, but 1 machine lets you isolate certain performance metrics.

On Thu, Jan 14, 2016 at 12:23 PM Robert Wille <rwille@fold3.com> wrote:

> I disagree. I think that you can extrapolate very little information about
> RF>1 and CL>1 by benchmarking with RF=1 and CL=1.
>
> On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal <anuragk@berkeley.edu>
> wrote:
>
> Hi John,
>
> Thanks for responding!
>
> The aim of this benchmark was not to benchmark Cassandra as an end-to-end
> distributed system, but to understand a break down of the performance. For
> instance, if we understand the performance characteristics that we can
> expect from a single machine cassandra instance with RF=Consistency=1, we
> can have a good estimate of what the distributed performance with higher
> replication factors and consistency are going to look like. Even in the
> ideal case, the performance improvement would scale at most linearly with
> more machines and replicas.
>
> That being said, I still want to understand whether this is the
> performance I should expect for the setup I described; if the performance
> for the current setup can be improved, then clearly the performance for a
> production setup (with multiple nodes, replicas) would also improve. Does
> that make sense?
>
> Thanks!
> Anurag
>
> On Jan 6, 2016, at 9:31 AM, John Schulz <schulz@pythian.com> wrote:
>
> Anurag,
>
> Unless you are planning on continuing to use only one machine with RF=1
> benchmarking a single system using RF=Consistancy=1 is mostly a waste of
> time. If you are going to use RF=1 and a single host then why use Cassandra
> at all. Plain old relational dbs should do the job just fine.
>
> Cassandra is designed to be distributed. You won't get the full impact of
> how it scales and the limits on scaling unless you benchmark a distributed
> system. For example the scaling impact of secondary indexes will not be
> visible on a single node.
>
> John
>
>
>
>
> On Tue, Jan 5, 2016 at 3:16 PM, Anurag Khandelwal <anuragk@berkeley.edu>
> wrote:
>
>> Hi,
>>
>> I’ve been benchmarking Cassandra to get an idea of how the performance
>> scales with more data on a single machine. I just wanted to get some
>> feedback to whether these are the numbers I should expect.
>>
>> The benchmarks are quite simple — I measure the latency and throughput
>> for two kinds of queries:
>>
>> 1. get() queries - These fetch an entire row for a given primary key.
>> 2. search() queries - These fetch all the primary keys for rows where a
>> particular column matches a particular value (e.g., “name” is “John
>> Smith”).
>>
>> Indexes are constructed for all columns that are queried.
>>
>> *Dataset*
>>
>> The dataset used comprises of ~1.5KB records (on an average) when
>> represented as CSV; there are 105 attributes in each record.
>>
>> *Queries*
>>
>> For get() queries, randomly generated primary keys are used.
>>
>> For search() queries, column values are selected such that their total
>> number of occurrences in the dataset is between 1 - 4000. For example, a
>> query for  “name” = “John Smith” would only be performed if the number of
>> rows that contain the same lies between 1-4000.
>>
>> The results for the benchmarks are provided below:
>>
>> *Latency Measurements*
>>
>> The latency measurements are an average of 10000 queries.
>>
>>
>>
>>
>>
>> *Throughput Measurements*
>>
>> The throughput measurements were repeated for 1-16 client threads, and
>> the numbers reported for each input size is for the configuration (i.e., #
>> client threads) with the highest throughput.
>>
>>
>>
>>
>>
>> Any feedback here would be greatly appreciated!
>>
>> Thanks!
>> Anurag
>>
>>
>
>
> --
>
> John H. Schulz
>
> Principal Consultant
>
> Pythian - Love your data
>
>
> schulz@pythian.com |  Linkedin
> www.linkedin.com/pub/john-schulz/13/ab2/930/
>
> Mobile: 248-376-3380
>
> *www.pythian.com <http://www.pythian.com/>*
>
> --
>
>
>
>
>
>
>

Mime
View raw message