cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anurag Khandelwal <anur...@berkeley.edu>
Subject Cassandra Performance on a Single Machine
Date Tue, 05 Jan 2016 20:16:33 GMT
Hi,

I’ve been benchmarking Cassandra to get an idea of how the performance scales with more
data on a single machine. I just wanted to get some feedback to whether these are the numbers
I should expect.

The benchmarks are quite simple — I measure the latency and throughput for two kinds of
queries:

1. get() queries - These fetch an entire row for a given primary key.
2. search() queries - These fetch all the primary keys for rows where a particular column
matches a particular value (e.g., “name” is “John Smith”). 

Indexes are constructed for all columns that are queried.

Dataset

The dataset used comprises of ~1.5KB records (on an average) when represented as CSV; there
are 105 attributes in each record.

Queries

For get() queries, randomly generated primary keys are used.

For search() queries, column values are selected such that their total number of occurrences
in the dataset is between 1 - 4000. For example, a query for  “name” = “John Smith”
would only be performed if the number of rows that contain the same lies between 1-4000.

The results for the benchmarks are provided below:

Latency Measurements

The latency measurements are an average of 10000 queries.





Throughput Measurements

The throughput measurements were repeated for 1-16 client threads, and the numbers reported
for each input size is for the configuration (i.e., # client threads) with the highest throughput.





Any feedback here would be greatly appreciated!

Thanks!
Anurag
Mime
View raw message