cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Svihla ...@foundev.pro>
Subject Re: To batch or not to batch: A question for fast inserts
Date Fri, 25 Sep 2015 19:14:03 GMT

I think my main point is still, unlogged token aware batches are great, but if you’re writes
are large enough, they may actually hurt rather than help, and likewise if your writes are
too small, async only is likely only going to hurt. I’d say the average user I’ve had
to help (with my selection bias) has individual writes already on the large size of optimal
so batching frequently hurts them. Also they tend not to do async in the first place.

In summary, batch or not is IMO the wrong area to focus, total write payload sizing for your
cluster is the factor to focus on and however you get there is fantastic. more replies inline:

> On Sep 25, 2015, at 1:24 PM, Eric Stevens <mightye@gmail.com> wrote:
> 
> > compaction usually is the limiter for most clusters, so the difference between async
versus unlogged batch ends up being minor or worse..non existent cause the hardware and data
model combination result in compaction being the main throttle.
> 
> If your number of records to load per second is predetermined (as would be the case in
any production use case), then this doesn't make any difference on compaction whether loaded
as batches vs as single statements, your cluster needs to support the same number and shape
of mutates either way.

Not everyone is as grown up about their cluster sizing. Lots of folks are still stuck on maximum
utilization, ironically these same people tend to focus on using spindles for storage and
so will ultimately end up having to throttle ingest to allow compaction to catch up. Anyway
in these admittedly awful situations throttling of ingest is all too common as the commit
log can basically easily outstrip compaction. 

> 
> > if you add in token awareness to your batch..you’ve basically eliminated the primary
complaint of using unlogged batches so why not do that. 
> 
> This is absolutely the right idea if your driver supports it, but the gain is smaller
than I would have expected based on the warnings of imminent doom when we've had this conversation
before.  If your driver supports token awareness, use that to group statements by primary
replica and concurrently execute those that way.  Here's the code we're using (in Scala using
the Java driver):
> def groupByFirstReplica()(implicit session: CQLSession): Map[Host, CQLBatch] = {
>   val meta = session.getCluster.getMetadata
>   statements.groupBy { st =>
>     try {
>       meta.getReplicas(st.getKeyspace, st.getRoutingKey).iterator().next
>     } catch { case NonFatal(e) =>
>       null
>     }
>   } mapValues { st => CQLBatch(st) }
> }
> We now have a map of primary host to sub-batch for all the statements in our logical
batch.  We can now do either of these (depending on how greedy we want to be in our client;
Future.traverse is preferred and nicer, Future.sequence is greedier and more resource intensive):
> Future.sequence(groupByFirstReplica().values.map(_.execute())).map(_.flatten)
> Future.traverse(groupByFirstReplica().values) { _.execute() }.map(_.flatten)
> We get back Future[Iterable[ResultSet]] - this future completes when the logical batch's
sub-batches have all completed.
> 
> Note that with the DSE Java driver, for the above to succeed in its intent, the statements
need to be prepared statements (for st.getRoutingKey to return non-null), and either the keyspace
has to be fully defined in the CQL, or you have to have set the correct keyspace when you
created the connection (for st.getKeyspace to return non-null).  Otherwise the values given
to meta.getReplicas will fail to resolve a primary host which results in doing token-unaware
batches (i.e. you'll get back a Map(null -> allStatements)).  However those same criteria
are required for single statements to be token aware.
> 

This is excellent stuff, my only concern with primary replicas is for people with uneven partitions,
and the occasionally stupidly fat one. I’d rather spread those writes around the other replicas
instead of beating up the primary one. However, for a well modeled partition key the approach
you outline is probably optimal.

> 
> 
> 
> On Fri, Sep 25, 2015 at 7:30 AM, Ryan Svihla <rs@foundev.pro <mailto:rs@foundev.pro>>
wrote:
> Generally this is all correct but I cannot emphasize enough how much this “just depends”
and today I generally move people to async inserts first before trying to micro-optimize some
things to keep in mind.
> 
> compaction usually is the limiter for most clusters, so the difference between async
versus unlogged batch ends up being minor or worse..non existent cause the hardware and data
model combination result in compaction being the main throttle.
> if you add in token awareness to your batch..you’ve basically eliminated the primary
complaint of using unlogged batches so why not do that. When I was at DataStax I made some
similar suggestions for token aware batch after seeing the perf improvements with Spark writes
using unlogged batch. Several others did as well so I’m not the first one with this idea.
> write size makes in my experience the largest difference BY FAR about which is faster.
and the number is largely irrelevant compared to the total payload size. Depending on the
hardware and etc a good rule of thumb is writes below 1k bytes tend to get really inefficient
and writes that are over 100k tend to slow down total throughput. I’ll reemphasize this
magic number has been different on almost every cluster I’ve tuned.
> 
> In summary all this means is, too small or too large of writes are slow, and unlogged
batches may involve some extra hops, if you eliminate the extra hops by token awareness then
it just comes down to write size optimization.
> 
>> On Sep 24, 2015, at 5:18 PM, Eric Stevens <mightye@gmail.com <mailto:mightye@gmail.com>>
wrote:
>> 
>> > I side-tracked some punctual benchmarks and stumbled on the observations of
unlogged inserts being *A LOT* faster than the async counterparts.
>> 
>> My own testing agrees very strongly with this.  When this topic came up on this list
before, there was a concern that batch coordination produces GC pressure in your cluster because
you're involving nodes which aren't strictly speaking necessary to be involved.  
>> 
>> Our own testing shows some small impact on this front, but really lightweight GC
tuning mitigated the effects by putting a little more room in Xmn (if you're still on CMS
garbage collector).  On G1GC (which is what we run in production) we weren't able to measure
a difference. 
>> 
>> Our testing shows data loads being as much as 5x to 8x faster when using small concurrent
batches over using single statements concurrently.  We tried three different concurrency models.
>> 
>> To save on coordinator overhead, we group the statements in our "batch" by replica
(using the functionality exposed by the DataStax Java driver), and do essentially token aware
batching.  This still has a small amount of additional coordinator overhead (since the data
size of the unit of work is larger, and sits in memory in the coordinator longer).  We've
been running this way successfully for months with sustained rates north of 50,000 mutates
per second.  We burst much higher.
>> 
>> Through trial and error we determined we got diminishing returns in the realm of
100 statements per token-aware batch.  It looks like your own data bears that out as well.
 I'm sure that's workload dependent though.
>> 
>> I've been disagreed with on this topic in this list in the past despite the numbers
I was able to post.  Nobody has shown me numbers (nor anything else concrete) that contradict
my position though, so I stand by it.  There's no question in my mind, if your mutates are
of any significant volume and you care about the performance of them, token aware unlogged
batching is the right strategy.  When we reduce our batch sizes or switch to single async
statements, we fall over immediately.  
>> 
>> On Tue, Sep 22, 2015 at 7:54 AM, Gerard Maas <gerard.maas@gmail.com <mailto:gerard.maas@gmail.com>>
wrote:
>> General advice advocates for individual async inserts as the fastest way to insert
data into Cassandra. Our insertion mechanism is based on that model and recently we have been
evaluating performance, looking to measure and optimize our ingestion rate.
>> 
>> I side-tracked some punctual benchmarks and stumbled on the observations of unlogged
inserts being *A LOT* faster than the async counterparts.
>> 
>> In our tests, unlogged batch shows increased throughput and lower cluster CPU usage,
so I'm wondering where the tradeoff might be.
>> 
>> I compiled those observations in this document that I'm sharing and opening up for
comments.  Are we observing some artifact or should we set the record straight for unlogged
batches to achieve better insertion throughput?
>> 
>> https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI <https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI>
>> 
>> Let me know.
>> 
>> Kind regards, 
>> 
>> Gerard.
>> 
> 
> Regards,
> 
> Ryan Svihla
> 
> 


Mime
View raw message