cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Schulz <sch...@pythian.com>
Subject Re: Cassandra Performance on a Single Machine
Date Wed, 06 Jan 2016 17:31:55 GMT
Anurag,

Unless you are planning on continuing to use only one machine with RF=1
benchmarking a single system using RF=Consistancy=1 is mostly a waste of
time. If you are going to use RF=1 and a single host then why use Cassandra
at all. Plain old relational dbs should do the job just fine.

Cassandra is designed to be distributed. You won't get the full impact of
how it scales and the limits on scaling unless you benchmark a distributed
system. For example the scaling impact of secondary indexes will not be
visible on a single node.

John




On Tue, Jan 5, 2016 at 3:16 PM, Anurag Khandelwal <anuragk@berkeley.edu>
wrote:

> Hi,
>
> I’ve been benchmarking Cassandra to get an idea of how the performance
> scales with more data on a single machine. I just wanted to get some
> feedback to whether these are the numbers I should expect.
>
> The benchmarks are quite simple — I measure the latency and throughput for
> two kinds of queries:
>
> 1. get() queries - These fetch an entire row for a given primary key.
> 2. search() queries - These fetch all the primary keys for rows where a
> particular column matches a particular value (e.g., “name” is “John
> Smith”).
>
> Indexes are constructed for all columns that are queried.
>
> *Dataset*
>
> The dataset used comprises of ~1.5KB records (on an average) when
> represented as CSV; there are 105 attributes in each record.
>
> *Queries*
>
> For get() queries, randomly generated primary keys are used.
>
> For search() queries, column values are selected such that their total
> number of occurrences in the dataset is between 1 - 4000. For example, a
> query for  “name” = “John Smith” would only be performed if the number of
> rows that contain the same lies between 1-4000.
>
> The results for the benchmarks are provided below:
>
> *Latency Measurements*
>
> The latency measurements are an average of 10000 queries.
>
>
>
>
>
> *Throughput Measurements*
>
> The throughput measurements were repeated for 1-16 client threads, and the
> numbers reported for each input size is for the configuration (i.e., #
> client threads) with the highest throughput.
>
>
>
>
>
> Any feedback here would be greatly appreciated!
>
> Thanks!
> Anurag
>
>


-- 

John H. Schulz

Principal Consultant

Pythian - Love your data


schulz@pythian.com |  Linkedin www.linkedin.com/pub/john-schulz/13/ab2/930/

Mobile: 248-376-3380

*www.pythian.com <http://www.pythian.com/>*

-- 


--




Mime
View raw message