cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Wille <rwi...@fold3.com>
Subject Re: What is the fastest way to get data into Cassandra 2 from a Java application?
Date Wed, 11 Dec 2013 12:40:43 GMT
I use hand-rolled batches a lot. You can get a *lot* of performance
improvement. Just make sure to sanitize your strings.

I¹ve been wondering, what¹s the limit, practical or hard, on the length of
a query?

Robert

On 12/11/13, 3:37 AM, "David Tinker" <david.tinker@gmail.com> wrote:

>Yes thats what I found.
>
>This is faster:
>
>for (int i = 0; i < 1000; i++) session.execute("INSERT INTO
>test.wibble (id, info) VALUES ('${"" + i}', '${"aa" + i}')")
>
>Than this:
>
>def ps = session.prepare("INSERT INTO test.wibble (id, info) VALUES (?,
>?)")
>for (int i = 0; i < 1000; i++) session.execute(ps.bind(["" + i, "aa" +
>i] as Object[]))
>
>This is the fastest option of all (hand rolled batch):
>
>StringBuilder b = new StringBuilder()
>b.append("BEGIN UNLOGGED BATCH\n")
>for (int i = 0; i < 1000; i++) {
>    b.append("INSERT INTO ").append(ks).append(".wibble (id, info)
>VALUES ('").append(i).append("','")
>            .append("aa").append(i).append("')\n")
>}
>b.append("APPLY BATCH\n")
>session.execute(b.toString())
>
>
>On Wed, Dec 11, 2013 at 10:56 AM, Sylvain Lebresne <sylvain@datastax.com>
>wrote:
>>
>>> This loop takes 2500ms or so on my test cluster:
>>>
>>> PreparedStatement ps = session.prepare("INSERT INTO perf_test.wibble
>>> (id, info) VALUES (?, ?)")
>>> for (int i = 0; i < 1000; i++) session.execute(ps.bind("" + i, "aa" +
>>>i));
>>>
>>> The same loop with the parameters inline is about 1300ms. It gets
>>> worse if there are many parameters.
>>
>>
>> Do you mean that:
>>   for (int i = 0; i < 1000; i++)
>>       session.execute("INSERT INTO perf_test.wibble (id, info) VALUES
>>(" + i
>> + ", aa" + i + ")");
>> is twice as fast as using a prepared statement? And that the difference
>> is even greater if you add more columns than "id" and "info"?
>>
>> That would certainly be unexpected, are you sure you're not
>>re-preparing the
>> statement every time in the loop?
>>
>> --
>> Sylvain
>>
>>> I know I can use batching to
>>> insert all the rows at once but thats not the purpose of this test. I
>>> also tried using session.execute(cql, params) and it is faster but
>>> still doesn't match inline values.
>>>
>>> Composing CQL strings is certainly convenient and simple but is there
>>> a much faster way?
>>>
>>> Thanks
>>> David
>>>
>>> I have also posted this on Stackoverflow if anyone wants the points:
>>>
>>> 
>>>http://stackoverflow.com/questions/20491090/what-is-the-fastest-way-to-g
>>>et-data-into-cassandra-2-from-a-java-application
>>
>>
>
>
>
>-- 
>http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ
>Integration



Mime
View raw message