cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Svihla ...@foundev.pro>
Subject Re: How to tune Cassandra or Java Driver to get lower latency when there are a lot of writes?
Date Fri, 25 Sep 2015 14:22:16 GMT
Why aren’t you using saveToCassandra (https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md
<https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md>)?
They have a number of locality aware optimizations that will probably exceed your by hand
bulk loading (especially if you’re not doing it inside something like foreach partition).

Also you can easily tune up and down the size of those tasks and therefore batches to minimize
harm on the prod system.

> On Sep 24, 2015, at 5:37 PM, Benyi Wang <bewang.tech@gmail.com> wrote:
> 
> I use Spark and spark-cassandra-connector with a customized Cassandra writer (spark-cassandra-connector
doesn’t support DELETE). Basically the writer works as follows:
> 
> Bind a row in Spark RDD with either INSERT/Delete PreparedStatement
> Create a BatchStatement for multiple rows
> Write to Cassandra.
> I knew using CQLBulkOutputFormat would be better, but it doesn't supports DELETE. 
> 
> On Thu, Sep 24, 2015 at 1:27 PM, Gerard Maas <gerard.maas@gmail.com <mailto:gerard.maas@gmail.com>>
wrote:
> How are you loading the data? I mean, what insert method are you using?
> 
> On Thu, Sep 24, 2015 at 9:58 PM, Benyi Wang <bewang.tech@gmail.com <mailto:bewang.tech@gmail.com>>
wrote:
> I have a cassandra cluster provides data to a web service. And there is a daily batch
load writing data into the cluster.
> 
> Without the batch loading, the service’s Latency 99thPercentile is 3ms. But during
the load, it jumps to 90ms.
> I checked cassandra keyspace’s ReadLatency.99thPercentile, which jumps to 1ms from
600 microsec.
> The service’s cassandra java driver request 99thPercentile was 90ms during the load
> The java driver took the most time. I knew the Cassandra servers are busy in writing,
but I want to know what kinds of metrics can identify where is the bottleneck so that I can
tune it.
> 
> I’m using Cassandra 2.1.8 and Cassandra Java Driver 2.1.5.
> 
> 
> 

Regards,

Ryan Svihla


Mime
View raw message