cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gustavo Gustavo <>
Subject Cassandra x MySQL Sharded - Insert Comparison
Date Sun, 22 Jan 2012 13:10:35 GMT

I've set up a testing evironment for Cassandra and MySQL, to compare both,
regarding *performance only*. And I must admit that I was expecting
Cassandra to beat MySQL. But I've not seen this happening up to now.
My application/use case is INSERT intensive, since I'm not updating
anything, just inserting all the time.
To compare both I created virtual machines with Ubuntu 11.10, and installed
the latest versions of each datastore. Each VM has 1GB of RAM. I've used
VMs as a way to give both datastores an equal sandbox.
MySQL is set up to work as sharded, with 2 databases, that means that
records are inserted to a specific instance based on key % 2. The engine is
MyISAM (InnoDB was really slow and not really needed to my case). There's a
primary compound key (integer and datetime columns) in this test table.
Let's name the "nodes" MySQL1 and MySQL2.
Cassandra is set up to work with 4 nodes, with keys (tokens) set up to
distribute records evenly across the 4 nodes (nodetool ring reports 25% to
each node), replication factor 1 and RandomPartitioner, the other configs
are left to default. Let's name the nodes Cassandra1, Cassandra2,
Cassandra3 and Cassandra4.

I'm using 2 physical machines (Windows7) to host the 4 (Cassandra) or 2
(MySQL) virtual machines, this way:
Machine1: MySQL1, Cassandra1, Cassandra3
Machine2: MySQL2, Cassandra2, Cassandra4
The machines have CPU and RAM enough to host Cassandra Cluster or MySQL
"Cluster" at a time.

The client test applicatin is running in a third physical machine, with 8
threads doing inserts. The test application is written in C# (Windows7)
using Aquiles high-level client.

My use case is a vehicle tracking system. So, let's suppose, from minute to
minute, the vehicle sends its position together with some other GPS data
and vehicle status information. The columns in my Cassandra cluster are
just the DateTime (long value) of a position for a specific vehicle, and
the value is all the other data serialized to binary format. Therefore, my
CF really grows in columns number. So all data is inserted only to one
CF/Table named Positions. The key to Cassandra is the VehicleID and to
MySQL VehicleID + PositionDateTime (MySQL creates an index to this
automatically). Important to note that MySQL threw tons of connection
exceptions, even though, the insert was retried until it got through MySQL.

My test case was to insert 1k positions for 1k vehicles to 10 days - which
gives 10.000.000 of inserts.

The final thoughtput that my application had for this scenario was:

Cassandra x 4
2012-01-21 11:45:38,044 #6         [Logger.Log] INFO  - >> Inserted 10000
positions for 1000 vehicles (10000000 inserts):
2012-01-21 11:45:38,082 #6         [Logger.Log] INFO  - >> Total Time:
2012-01-21 11:45:38,085 #6         [Logger.Log] INFO  - >> Throughput: 1061

And for MySQL x 2
2012-01-21 14:26:25,197 #6         [Logger.Log] INFO  - >> Inserted 10000
positions for 1000 vehicles (10000000 inserts):
2012-01-21 14:26:25,250 #6         [Logger.Log] INFO  - >> Total Time:
2012-01-21 14:26:25,263 #6         [Logger.Log] INFO  - >> Throughput: 1318

Is there something that I'm missing here? Is this excepted? Or the problem
is somewhere else and that's hard to say looking at this description?


View raw message