incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TuxRacer69 <tuxrace...@gmail.com>
Subject Re: Why cassandra single node so slow?
Date Sat, 14 Nov 2009 12:00:53 GMT
Hi Ruslan,

did you store the logs and the data on 2 different disks as described at:
http://wiki.apache.org/cassandra/StorageConfiguration
and
http://wiki.apache.org/cassandra/FAQ#what_kind_of_hardware_should_i_use
?

Cheers
TuxRacer

ruslan usifov wrote:
> Hello!
>
> I'm new in cassandra son i can misunderstand some things.
>
> In follow "benchmark". I have  insert 4000000 records like this
>
> {"value": str(i), "text": "some small text"}
>
> I use lazyboy lib (http://github.com/digg/lazyboy) to simplify work 
> with cassandra thrift interface. So my insert python program look like 
> this:
>
> from lazyboy import *
> from lazyboy.key import Key;
>
> import time;
> import random;
>
> # Define your cluster(s)
> connection.add_pool('test', ['localhost:9160'])
>
> for j in xrange(0, 41):
>   bt = time.time();
>   begin = 100000 * j;
>
>   for i in xrange(begin, begin + 100000):
>     if (i != begin) and ((i % 10000) == 0):
>       print time.time() - bt;
>       bt = time.time()
>
>     rec = record.Record();
>     rec.key = Key("test", "Aquarium", str(i));
>
>     rec.update({"value": str(i), "text": "ruslan text"})
>     rec.save();
>
>   print time.time() - bt;
>   print "%s'th 100000 inserts done" % (j);
>
>   time.sleep(10);
>
>
> Then i try to fetch random records from my storage:
>
> begin = time.time();
>
> for i in xrange(0, 100000):
>   if i and (i % 10000) == 0:
>     print time.time() - begin;
>     begin = time.time()
>
>   rec = record.Record();
>   rec.load(Key("test", "Aquarium", str(random.randint(0, 3000000))));
>
> print time.time() - begin;
>
>
> And on evry 10000 requests i get about 8 seconds:
>
> 8.04699993134
> 8.07800006866
> 8.18799996376
> 8.17199993134
> 8.15600013733
> 8.09399986267
> 8.07800006866
> 8.04699993134
> 8.06200003624
> 8.06299996376
>
>
> Then i do similar test with MySQL on InnoDB storage engine, with 
> follow program:
>
> import MySQLdb as dbi;
> from MySQLdb.cursors import *;
>
> import time;
> import random;
> import sys;
>
> g_dbh  = dbi.connect(db="test", user="root", passwd="root");
> cursor = g_dbh.cursor();
>
> begin = time.time();
>
> for i in xrange(0, 100000):
>   if i and (i % 10000) == 0:
>     print time.time() - begin;
>     begin = time.time()
>
>   cursor.execute("select * from test where value=%s", 
> random.randint(0, 3000000));
>   cursor.fetchone();
>
> print time.time() - begin;
>
>
> And get about 1.5 seconds per 10000 requests:
> 1.54699993134
> 1.57800006852
> 1.18799996376
> 1.46671993134
> 1.76670013733
> 1.50399986267
> 1.57800003872
> 1.50699993134
> 1.50200003624
> 1.50099996313
>
> Is it normal? Or i do something wrong.  i have that cassandra slow in 
> 8/1.5 = 5.3 times less than Mysql InnoDB
>
>
>  In cassandra i off all debugging, and my keyspace look like this:
>
>   <Keyspaces>
>     <Keyspace Name="test">
>        <ColumnFamily CompareWith="BytesType" Name="Aquarium" />
>     </Keyspace>
>   </Keyspaces>
>
>
> My innoDb table look like this:
>
> CREATE TABLE `test` (
>   `value` int(11) NOT NULL,
>   `text` char(255) NOT NULL,
>   PRIMARY KEY (`value`)
> ) ENGINE=InnoDB DEFAULT CHARSET=utf8
>
>
> In mysql i use TCP/IP connection to server not UNIX domain sockets. 
> All test where done on Intel core 2 duo 8600 3Gz. On FreeBSD 7.2
>


Mime
View raw message