Hi,
Not sure this is the case for your Bad Performance, but you are Meassuring Data creation and
Insertion together. Your Data creation involves Lots of class casts which are probably quite
Slow.
Try
Timing only the b.send Part and See how Long that Takes.
Roland
Am 03.05.2011 um 12:30 schrieb "charles THIBAULT" <charl.thibault@gmail.com>:
> Hello everybody,
>
> first: sorry for my english in advance!!
>
> I'm getting started with Cassandra on a 5 nodes cluster inserting data
> with the pycassa API.
>
> I've read everywere on internet that cassandra's performance are better than MySQL
> because of the writes append's only into commit logs files.
>
> When i'm trying to insert 100 000 rows with 10 columns per row with batch insert, I'v
this result: 27 seconds
> But with MySQL (load data infile) this take only 2 seconds (using indexes)
>
> Here my configuration
>
> cassandra version: 0.7.5
> nodes : 192.168.1.210, 192.168.1.211, 192.168.1.212, 192.168.1.213, 192.168.1.214
> seed: 192.168.1.210
>
> My script
> *************************************************************************************************************
> #!/usr/bin/env python
>
> import pycassa
> import time
> import random
> from cassandra import ttypes
>
> pool = pycassa.connect('test', ['192.168.1.210:9160'])
> cf = pycassa.ColumnFamily(pool, 'test')
> b = cf.batch(queue_size=50, write_consistency_level=ttypes.ConsistencyLevel.ANY)
>
> tps1 = time.time()
> for i in range(100000):
> columns = dict()
> for j in range(10):
> columns[str(j)] = str(random.randint(0,100))
> b.insert(str(i), columns)
> b.send()
> tps2 = time.time()
>
>
> print("execution time: " + str(tps2 - tps1) + " seconds")
> *************************************************************************************************************
>
> what I'm doing rong ?
|