cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From graham sanderson <>
Subject Re: What is the fastest way to get data into Cassandra 2 from a Java application?
Date Wed, 11 Dec 2013 01:16:54 GMT
I can’t speak for Astyanax; their thrift transport I believe is abstracted out, however the
object model is very CF wide row vs table-y.

I have no idea what the plans are for further Astyanax dev (maybe someone on this list), but
I believe the thrift API is not going away, so considering Astyanax/thrift is an option, thought
I’d imagine you wouldn’t gain much going down the CQL over thrift method, so you need
to be able to model your data in “internal” form.

Two reasons we may want to move to the binary protocol 
for reads: asynchronous ability (which is now in thrift but it seems unlikely to be utilized
in cassandra)
for writes: compression, since we are (currently) network bandwidth limited for enormous batch
inserts (from hadoop)

On Dec 10, 2013, at 6:44 AM, David Tinker <> wrote:

> Hmm. I have read that the thrift interface to Cassandra is out of
> favour and the CQL interface is in. Where does that leave Astyanax?
> On Tue, Dec 10, 2013 at 1:14 PM, graham sanderson <> wrote:
>> Perhaps not the way forward, however I can bulk insert data via astyanax at a rate
that maxes out our (fast) networks. That said for our next release (of this part of our product
- our other current is node.js via binary protocol) we will be looking at insert speed via
java driver, and also alternative scala/java implementations of the binary protocol.
>> On Dec 10, 2013, at 4:49 AM, David Tinker <> wrote:
>>> I have tried the DataStax Java driver and it seems the fastest way to
>>> insert data is to compose a CQL string with all parameters inline.
>>> This loop takes 2500ms or so on my test cluster:
>>> PreparedStatement ps = session.prepare("INSERT INTO perf_test.wibble
>>> (id, info) VALUES (?, ?)")
>>> for (int i = 0; i < 1000; i++) session.execute(ps.bind("" + i, "aa" + i));
>>> The same loop with the parameters inline is about 1300ms. It gets
>>> worse if there are many parameters. I know I can use batching to
>>> insert all the rows at once but thats not the purpose of this test. I
>>> also tried using session.execute(cql, params) and it is faster but
>>> still doesn't match inline values.
>>> Composing CQL strings is certainly convenient and simple but is there
>>> a much faster way?
>>> Thanks
>>> David
>>> I have also posted this on Stackoverflow if anyone wants the points:
> -- 
> Persistent Message Queues With Replay and #RabbitMQ Integration

View raw message