cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From charles THIBAULT <charl.thiba...@gmail.com>
Subject Re: low performance inserting
Date Tue, 03 May 2011 15:06:10 GMT
Hi Sylvain,

thanks for your answer.

I'd make a test with the stress utility inserting 100 000 rows with 10
columns per row
I use these options: -o insert -t 5 -n 100000 -c 10 -d
192.168.1.210,192.168.1.211,...
result: 161 seconds

with MySQL using inserts (after a dump): 1.79 second

Charles

2011/5/3 Sylvain Lebresne <sylvain@datastax.com>

> There is probably a fair number of things you'd have to make sure you do to
> improve the write performance on the Cassandra side (starting by using
> multiple
> threads to do the insertion), but the first thing is probably to start
> comparing things
> that are at least mildly comparable. If you do inserts in Cassandra,
> you should try
> to do inserts in MySQL too, not "load data infile" (which really is
> just a bulk loading
> utility). And as stated here
> http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html:
> "When loading a table from a text file, use LOAD DATA INFILE. This is
> usually 20 times
> faster than using INSERT statements."
>
> --
> Sylvain
>
> On Tue, May 3, 2011 at 12:30 PM, charles THIBAULT
> <charl.thibault@gmail.com> wrote:
> > Hello everybody,
> >
> > first: sorry for my english in advance!!
> >
> > I'm getting started with Cassandra on a 5 nodes cluster inserting data
> > with the pycassa API.
> >
> > I've read everywere on internet that cassandra's performance are better
> than
> > MySQL
> > because of the writes append's only into commit logs files.
> >
> > When i'm trying to insert 100 000 rows with 10 columns per row with batch
> > insert, I'v this result: 27 seconds
> > But with MySQL (load data infile) this take only 2 seconds (using
> indexes)
> >
> > Here my configuration
> >
> > cassandra version: 0.7.5
> > nodes : 192.168.1.210, 192.168.1.211, 192.168.1.212, 192.168.1.213,
> > 192.168.1.214
> > seed: 192.168.1.210
> >
> > My script
> >
> *************************************************************************************************************
> > #!/usr/bin/env python
> >
> > import pycassa
> > import time
> > import random
> > from cassandra import ttypes
> >
> > pool = pycassa.connect('test', ['192.168.1.210:9160'])
> > cf = pycassa.ColumnFamily(pool, 'test')
> > b = cf.batch(queue_size=50,
> > write_consistency_level=ttypes.ConsistencyLevel.ANY)
> >
> > tps1 = time.time()
> > for i in range(100000):
> >     columns = dict()
> >     for j in range(10):
> >         columns[str(j)] = str(random.randint(0,100))
> >     b.insert(str(i), columns)
> > b.send()
> > tps2 = time.time()
> >
> >
> > print("execution time: " + str(tps2 - tps1) + " seconds")
> >
> *************************************************************************************************************
> >
> > what I'm doing rong ?
> >
>

Mime
View raw message