flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Can we do batch writes on cassandra using flink while leveraging the locality?
Date Tue, 01 Nov 2016 19:29:16 GMT
Hi!

I do not know the details of how Cassandra supports batched writes, but
here are some thoughts:

  - Grouping writes that go to the same partition together into one batch
write request makes sense. If you have some sample code for that, it should
be not too hard to integrate into the Flink Cassandra connector

  - If you know the partitioning scheme in Cassandra and you use
"DataStream.partitionCustom(partitioner, key)" it should result in a way
that all write requests from one parallel sink instance go to the same
Cassandra node (or a small number of nodes). Would that help?

Greetings,
Stephan




On Fri, Oct 28, 2016 at 8:57 AM, kant kodali <kanth909@gmail.com> wrote:

> Spark Cassandra connector does it! but I don't think it really implements
> a custom partitioner I think it just leverages token aware policy and does
> batch writes by default within a partition but you can also do across
> partitions with the same replica!
>
> On Thu, Oct 27, 2016 at 8:41 AM, Shannon Carey <scarey@expedia.com> wrote:
>
>> It certainly seems possible to write a Partitioner that does what you
>> describe. I started implementing one but didn't have time to finish it. I
>> think the main difficulty is in properly dealing with partition ownership
>> changes in Cassandra… if you are maintaining state in Flink and the
>> partitioning changes, your job might produce inaccurate output. If, on the
>> other hand, you are only using the partitioner just before the output,
>> dynamic partitioning changes might be ok.
>>
>>
>> From: kant kodali <kanth909@gmail.com>
>> Date: Thursday, October 27, 2016 at 3:17 AM
>> To: <user@flink.apache.org>
>> Subject: Can we do batch writes on cassandra using flink while
>> leveraging the locality?
>>
>> locality? For example the batch writes in Cassandra will put pressure on
>> the coordinator but since the connectors are built by leveraging the
>> locality I was wondering if we could do batch of writes on a node where the
>> batch belongs?
>>
>
>

Mime
View raw message