I was inserting the contents of wikipedia, so the columns were at multi kilobyte strings. It's a good data source to run tests with as the records and relationships are somewhat varied in size.

My main point was to say the best way to benchmark cassandra with with multiple server nodes, multiple client threads /processes, the level of redundancy and consistency you want to run at in production, and if you can some sort of approximation of the data size. A single cassandra instance may well lose against  single RDBMS instance in a straight out race (thought as jonathan points out mongo is not playing fair). But you generally would not deploy a single cassandra node.

If you can provide some more details on your test we may be able to help:
- what is the target application
- the cassandra schema and any configuration changes
- the java code you used

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton

On 5 May 2011, at 02:01, Steve Smith wrote:

Since each row in my column family has 30 columns, wouldn't this translate to ~8,000 rows per second...or am I misunderstanding something.

Talking in terms of columns, my load test would seem to perform as follows:

100,000 rows / 26 sec * 30 columns/row = 115K columns per second.

That's on a dual core, 2.66 GHz laptop, 4GB RAM...single running cassandra node....hector (java) client.

Am I interpreting things correctly?

- Steve


On Tue, May 3, 2011 at 3:59 PM, aaron morton <aaron@thelastpickle.com> wrote:
To give an idea, last March (2010) I run the a much older Cassandra on 10 HP blades (dual socket, 4 core, 16GB, 2.5 laptop HDD) and was writing around 250K columns per second with 500 python processes loading the data from wikipedia running on another 10 HP blades.

This was my first out of the box no tuning (other then using sensible batch updates) test. Since then Cassandra has gotten much faster.

Hope that helps
Aaron

On 4 May 2011, at 02:22, Jonathan Ellis wrote:

> You don't give many details, but I would guess:
>
> - your benchmark is not multithreaded
> - mongodb is not configured for durable writes, so you're really only
> measuring the time for it to buffer it in memory
> - you haven't loaded enough data to hit "mongo's index doesn't fit in
> memory anymore"
>
> On Tue, May 3, 2011 at 8:24 AM, Steve Smith <stevenpsmith123@gmail.com> wrote:
>> I am working for client that needs to persist 100K-200K records per second
>> for later querying.  As a proof of concept, we are looking at several
>> options including nosql (Cassandra and MongoDB).
>> I have been running some tests on my laptop (MacBook Pro, 4GB RAM, 2.66 GHz,
>> Dual Core/4 logical cores) and have not been happy with the results.
>> The best I have been able to accomplish is 100K records in approximately 30
>> seconds.  Each record has 30 columns, mostly made up of integers.  I have
>> tried both the Hector and Pelops APIs, and have tried writing in batches
>> versus one at a time.  The times have not varied much.
>> I am using the out of the box configuration for Cassandra, and while I know
>> using 1 disk will have an impact on performance, I would expect to see
>> better write numbers than I am.
>> As a point of reference, the same test using MongoDB I was able to
>> accomplish 100K records in 3.5 seconds.
>> Any tips would be appreciated.
>>
>> - Steve
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com