cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pradeep Kumar Mantha <pradeep...@gmail.com>
Subject Re: Cassandra Performance Benchmarking.
Date Fri, 18 Jan 2013 00:05:05 GMT
Hi,

Thanks. I would like to benchmark cassandra with our application so
that we understand the details of how the actual benchmarking is done.
Not sure, how easy it would be to integrate YCSB with our application.

So, i am trying different client interfaces to cassandra.

I found

for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
threads ( each querying X number of queries ).

cassandra-cli     took 133 seconds
pycassa took 521 seconds.

Here is the python pycassa code used to query and passed to each thread....

def start_cassandra_client(Threadname):
        pool = pycassa.ConnectionPool('Blast', server_list=['xxx.xx.xx.xx'])
        cf = pycassa.ColumnFamily(pool, 'Blast_NR')
        inp_file=open("pycassa_100%_query")
        for key in inp_file:
                key=key.strip()
                cf.get(key)

Does Java clients like Hector/Astynax help here.. I am more
comfortable with Python than Java and our existing application is also
in Python.

thanks
pradeep


On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
> Wow you managed to do a load test through the cassandra-cli. There should be
> a merit badge for that.
>
> You should use the built in stress tool or YCSB.
>
> The CLI has to do much more string conversion then a normal client would and
> it is not built for performance. You will definitely get better numbers
> through other means.
>
> On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha <pradeepm66@gmail.com>
> wrote:
>>
>> Hi,
>>
>> I am trying to maximize execution of the number of read queries/second.
>>
>> Here is my cluster configuration.
>>
>> Replication - Default
>> 12 Data Nodes.
>> 16 Client Nodes - used for querying.
>>
>> Each client node executes 32 threads - each thread executes 76896 read
>> queries using  cassandra-cli tool.
>>        i.e all the read queries are stored in a file and that file is
>> given to cassandra-cli tool ( using -f option ) which is executed by a
>> thread.
>> so, total number of queries for 16 client Nodes is 16 * 32 * 76896.
>>
>> The read queries on each client node submitted at the same time. The
>> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
>> which is nearly 53k transactions/second.
>>
>> I would like to know if there is any other way/tool through which I
>> can improve the number of transactions/second.
>> Is the performance affected by cassandra-cli tool?
>>
>> thanks
>> pradeep
>
>

Mime
View raw message