cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Cassandra x MySQL Sharded - Insert Comparison
Date Sun, 22 Jan 2012 14:51:06 GMT
In some sense 1 for one performance "almost" does not matter. Thou I bet
you can get Cassandra better (I remember old school ycsb white paper
benches against a sharded mysql).

One of the main bullet points of Cassandra is if you want to grow from 4
nodes, to 8 nodes, to 14 nodes, and so on, Cassandra is elastic and
supports online adding and removing of nodes. A do-it-yourself hash mod
this algorithm really has no upgrade path

Edward

On Sun, Jan 22, 2012 at 9:26 AM, Chris Gerken <chrisgerken@mindspring.com>wrote:

> Howdy Gustavo,
>
> One thing that jumped out at me is your having put two cassandra images on
> the same box.  There may be enough CPU and memory for the two images
> combined but you may be seeing some other resource not being shared so
> nicely - network card bandwidth, for example.
>
> More generally, the real question is what the bottleneck is (for both
> db's, actually).  Start with Cassandra running in that configuration and
> start with one client thread sending one request a second.  Look at the
> CPU, network and memory metrics for all boxes (including the client).
>  Nothing should be even close to maxing out that that throughout.  Now
> incrementally increase one of the test parameters (number of clients or
> number of inserts per second) just a bit (say from one transaction to 5)
> and note the above metrics.  Keep slowly increasing the test parameters,
> one at a time, until one of the metrics maxes out.  That's the bottleneck
> you're wondering about.  Fix that and the db, be it Cassandra or MySQL)
> will move ahead of the other performance-wise.  Turn your attention to the
> other db and repeat.
>
> - Chris Gerken
>
> On Jan 22, 2012, at 7:10 AM, Gustavo Gustavo wrote:
>
> Hello,
>
> I've set up a testing evironment for Cassandra and MySQL, to compare both,
> regarding *performance only*. And I must admit that I was expecting
> Cassandra to beat MySQL. But I've not seen this happening up to now.
> My application/use case is INSERT intensive, since I'm not updating
> anything, just inserting all the time.
> To compare both I created virtual machines with Ubuntu 11.10, and
> installed the latest versions of each datastore. Each VM has 1GB of RAM.
> I've used VMs as a way to give both datastores an equal sandbox.
> MySQL is set up to work as sharded, with 2 databases, that means that
> records are inserted to a specific instance based on key % 2. The engine is
> MyISAM (InnoDB was really slow and not really needed to my case). There's a
> primary compound key (integer and datetime columns) in this test table.
> Let's name the "nodes" MySQL1 and MySQL2.
> Cassandra is set up to work with 4 nodes, with keys (tokens) set up to
> distribute records evenly across the 4 nodes (nodetool ring reports 25% to
> each node), replication factor 1 and RandomPartitioner, the other configs
> are left to default. Let's name the nodes Cassandra1, Cassandra2,
> Cassandra3 and Cassandra4.
>
> I'm using 2 physical machines (Windows7) to host the 4 (Cassandra) or 2
> (MySQL) virtual machines, this way:
> Machine1: MySQL1, Cassandra1, Cassandra3
> Machine2: MySQL2, Cassandra2, Cassandra4
> The machines have CPU and RAM enough to host Cassandra Cluster or MySQL
> "Cluster" at a time.
>
> The client test applicatin is running in a third physical machine, with 8
> threads doing inserts. The test application is written in C# (Windows7)
> using Aquiles high-level client.
>
> My use case is a vehicle tracking system. So, let's suppose, from minute
> to minute, the vehicle sends its position together with some other GPS data
> and vehicle status information. The columns in my Cassandra cluster are
> just the DateTime (long value) of a position for a specific vehicle, and
> the value is all the other data serialized to binary format. Therefore, my
> CF really grows in columns number. So all data is inserted only to one
> CF/Table named Positions. The key to Cassandra is the VehicleID and to
> MySQL VehicleID + PositionDateTime (MySQL creates an index to this
> automatically). Important to note that MySQL threw tons of connection
> exceptions, even though, the insert was retried until it got through MySQL.
>
> My test case was to insert 1k positions for 1k vehicles to 10 days - which
> gives 10.000.000 of inserts.
>
> The final thoughtput that my application had for this scenario was:
>
> Cassandra x 4
> 2012-01-21 11:45:38,044 #6         [Logger.Log] INFO  - >> Inserted 10000
> positions for 1000 vehicles (10000000 inserts):
> 2012-01-21 11:45:38,082 #6         [Logger.Log] INFO  - >> Total Time:
> 2:37:03,359
> 2012-01-21 11:45:38,085 #6         [Logger.Log] INFO  - >> Throughput:
> 1061 inserts/s
>
> And for MySQL x 2
> 2012-01-21 14:26:25,197 #6         [Logger.Log] INFO  - >> Inserted 10000
> positions for 1000 vehicles (10000000 inserts):
> 2012-01-21 14:26:25,250 #6         [Logger.Log] INFO  - >> Total Time:
> 2:06:25,914
> 2012-01-21 14:26:25,263 #6         [Logger.Log] INFO  - >> Throughput:
> 1318 inserts/s
>
> Is there something that I'm missing here? Is this excepted? Or the problem
> is somewhere else and that's hard to say looking at this description?
>
> Cheers,
> Gustavo
>
>
>

Mime
View raw message