Hi Sylvain,

thanks for your answer.

I'd make a test with the stress utility inserting 100 000 rows with 10 columns per row
I use these options: -o insert -t 5 -n 100000 -c 10 -d 192.168.1.210,192.168.1.211,...
result: 161 seconds

with MySQL using inserts (after a dump): 1.79 second

Charles

2011/5/3 Sylvain Lebresne <sylvain@datastax.com>
There is probably a fair number of things you'd have to make sure you do to
improve the write performance on the Cassandra side (starting by using multiple
threads to do the insertion), but the first thing is probably to start
comparing things
that are at least mildly comparable. If you do inserts in Cassandra,
you should try
to do inserts in MySQL too, not "load data infile" (which really is
just a bulk loading
utility). And as stated here
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html:
"When loading a table from a text file, use LOAD DATA INFILE. This is
usually 20 times
faster than using INSERT statements."

--
Sylvain

On Tue, May 3, 2011 at 12:30 PM, charles THIBAULT
<charl.thibault@gmail.com> wrote:
> Hello everybody,
>
> first: sorry for my english in advance!!
>
> I'm getting started with Cassandra on a 5 nodes cluster inserting data
> with the pycassa API.
>
> I've read everywere on internet that cassandra's performance are better than
> MySQL
> because of the writes append's only into commit logs files.
>
> When i'm trying to insert 100 000 rows with 10 columns per row with batch
> insert, I'v this result: 27 seconds
> But with MySQL (load data infile) this take only 2 seconds (using indexes)
>
> Here my configuration
>
> cassandra version: 0.7.5
> nodes : 192.168.1.210, 192.168.1.211, 192.168.1.212, 192.168.1.213,
> 192.168.1.214
> seed: 192.168.1.210
>
> My script
> *************************************************************************************************************
> #!/usr/bin/env python
>
> import pycassa
> import time
> import random
> from cassandra import ttypes
>
> pool = pycassa.connect('test', ['192.168.1.210:9160'])
> cf = pycassa.ColumnFamily(pool, 'test')
> b = cf.batch(queue_size=50,
> write_consistency_level=ttypes.ConsistencyLevel.ANY)
>
> tps1 = time.time()
> for i in range(100000):
>     columns = dict()
>     for j in range(10):
>         columns[str(j)] = str(random.randint(0,100))
>     b.insert(str(i), columns)
> b.send()
> tps2 = time.time()
>
>
> print("execution time: " + str(tps2 - tps1) + " seconds")
> *************************************************************************************************************
>
> what I'm doing rong ?
>