cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sanda <john.sa...@gmail.com>
Subject Re: What is the fastest way to get data into Cassandra 2 from a Java application?
Date Wed, 11 Dec 2013 05:46:59 GMT
The session.execute blocks until the C* returns the response. Use the async
version, but do so with caution. If you don't throttle the requests, you
will start seeing timeouts on the client side pretty quickly. For
throttling I've used a Semaphore, but I think Guava's RateLimiter is better
suited. And if you want to wait until all the writes have finished,
definitely use Guava's futures API. Try something like,

PreparedStatement ps = session.prepare("INSERT INTO perf_test.wibble
(id, info) VALUES (?, ?)");
RateLimiter permits = RateLimiter.create(500);    // you will need to tune
this to your environment
int count = 1000;
final CountDownLatch latch = new CountDownLatch(count);
for (int i = 0; i < count; i++) {
    ResultSetFuture future = session.executeAsync(ps.bind("" + i, "aa" +
i));
    Futures.addCallback(future, new FutureCallback<ResultSet>() {
        public void onSuccess(ResultSet rows) {
            latch.countDown();
        }

        public void onFailure(Throwable t) {
            latch.countDown();
            // log the error or other error handling
        }
    });
}
latch.await();   // need to handle and/or throw InterruptedException



On Tue, Dec 10, 2013 at 8:16 PM, graham sanderson <graham@vast.com> wrote:

> I can’t speak for Astyanax; their thrift transport I believe is abstracted
> out, however the object model is very CF wide row vs table-y.
>
> I have no idea what the plans are for further Astyanax dev (maybe someone
> on this list), but I believe the thrift API is not going away, so
> considering Astyanax/thrift is an option, thought I’d imagine you wouldn’t
> gain much going down the CQL over thrift method, so you need to be able to
> model your data in “internal” form.
>
> Two reasons we may want to move to the binary protocol
> for reads: asynchronous ability (which is now in thrift but it seems
> unlikely to be utilized in cassandra)
> for writes: compression, since we are (currently) network bandwidth
> limited for enormous batch inserts (from hadoop)
>
> On Dec 10, 2013, at 6:44 AM, David Tinker <david.tinker@gmail.com> wrote:
>
> > Hmm. I have read that the thrift interface to Cassandra is out of
> > favour and the CQL interface is in. Where does that leave Astyanax?
> >
> > On Tue, Dec 10, 2013 at 1:14 PM, graham sanderson <graham@vast.com>
> wrote:
> >> Perhaps not the way forward, however I can bulk insert data via
> astyanax at a rate that maxes out our (fast) networks. That said for our
> next release (of this part of our product - our other current is node.js
> via binary protocol) we will be looking at insert speed via java driver,
> and also alternative scala/java implementations of the binary protocol.
> >>
> >> On Dec 10, 2013, at 4:49 AM, David Tinker <david.tinker@gmail.com>
> wrote:
> >>
> >>> I have tried the DataStax Java driver and it seems the fastest way to
> >>> insert data is to compose a CQL string with all parameters inline.
> >>>
> >>> This loop takes 2500ms or so on my test cluster:
> >>>
> >>> PreparedStatement ps = session.prepare("INSERT INTO perf_test.wibble
> >>> (id, info) VALUES (?, ?)")
> >>> for (int i = 0; i < 1000; i++) session.execute(ps.bind("" + i, "aa" +
> i));
> >>>
> >>> The same loop with the parameters inline is about 1300ms. It gets
> >>> worse if there are many parameters. I know I can use batching to
> >>> insert all the rows at once but thats not the purpose of this test. I
> >>> also tried using session.execute(cql, params) and it is faster but
> >>> still doesn't match inline values.
> >>>
> >>> Composing CQL strings is certainly convenient and simple but is there
> >>> a much faster way?
> >>>
> >>> Thanks
> >>> David
> >>>
> >>> I have also posted this on Stackoverflow if anyone wants the points:
> >>>
> http://stackoverflow.com/questions/20491090/what-is-the-fastest-way-to-get-data-into-cassandra-2-from-a-java-application
> >>
> >
> >
> >
> > --
> > http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ
> Integration
>
>


-- 

- John

Mime
View raw message