N. But i think that all data will be in memory and gstat (FreeBSD utility) show that no any disk activity.

2009/11/14 TuxRacer69 <tuxracer69@gmail.com>
Hi Ruslan,

did you store the logs and the data on 2 different disks as described at:
http://wiki.apache.org/cassandra/StorageConfiguration
and
http://wiki.apache.org/cassandra/FAQ#what_kind_of_hardware_should_i_use
?

Cheers
TuxRacer


ruslan usifov wrote:
Hello!

I'm new in cassandra son i can misunderstand some things.

In follow "benchmark". I have insert 4000000 records like this

{"value": str(i), "text": "some small text"}

I use lazyboy lib (http://github.com/digg/lazyboy) to simplify work with cassandra thrift interface. So my insert python program look like this:

from lazyboy import *
from lazyboy.key import Key;

import time;
import random;

# Define your cluster(s)
connection.add_pool('test', ['localhost:9160'])

for j in xrange(0, 41):
bt = time.time();
begin = 100000 * j;

for i in xrange(begin, begin + 100000):
if (i != begin) and ((i % 10000) == 0):
print time.time() - bt;
bt = time.time()

rec = record.Record();
rec.key = Key("test", "Aquarium", str(i));

rec.update({"value": str(i), "text": "ruslan text"})
rec.save();

print time.time() - bt;
print "%s'th 100000 inserts done" % (j);

time.sleep(10);


Then i try to fetch random records from my storage:

begin = time.time();

for i in xrange(0, 100000):
if i and (i % 10000) == 0:
print time.time() - begin;
begin = time.time()

rec = record.Record();
rec.load(Key("test", "Aquarium", str(random.randint(0, 3000000))));

print time.time() - begin;


And on evry 10000 requests i get about 8 seconds:

8.04699993134
8.07800006866
8.18799996376
8.17199993134
8.15600013733
8.09399986267
8.07800006866
8.04699993134
8.06200003624
8.06299996376


Then i do similar test with MySQL on InnoDB storage engine, with follow program:

import MySQLdb as dbi;
from MySQLdb.cursors import *;

import time;
import random;
import sys;

g_dbh = dbi.connect(db="test", user="root", passwd="root");
cursor = g_dbh.cursor();

begin = time.time();

for i in xrange(0, 100000):
if i and (i % 10000) == 0:
print time.time() - begin;
begin = time.time()

cursor.execute("select * from test where value=%s", random.randint(0, 3000000));
cursor.fetchone();

print time.time() - begin;


And get about 1.5 seconds per 10000 requests:
1.54699993134
1.57800006852
1.18799996376
1.46671993134
1.76670013733
1.50399986267
1.57800003872
1.50699993134
1.50200003624
1.50099996313

Is it normal? Or i do something wrong. i have that cassandra slow in 8/1.5 = 5.3 times less than Mysql InnoDB


In cassandra i off all debugging, and my keyspace look like this:

<Keyspaces>
<Keyspace Name="test">
<ColumnFamily CompareWith="BytesType" Name="Aquarium" />
</Keyspace>
</Keyspaces>


My innoDb table look like this:

CREATE TABLE `test` (
`value` int(11) NOT NULL,
`text` char(255) NOT NULL,
PRIMARY KEY (`value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


In mysql i use TCP/IP connection to server not UNIX domain sockets. All test where done on Intel core 2 duo 8600 3Gz. On FreeBSD 7.2