cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Akhtar <ali.rac...@gmail.com>
Subject Processing time series data in order
Date Thu, 22 Dec 2016 01:24:36 GMT
- I'm receiving a batch of messages to a Kafka topic.

Each message has a timestamp, however the messages can arrive / get
processed out of order. I.e event 1's timestamp could've been a few seconds
before event 2, and event 2 could still get processed before event 1.

- I know the number of messages that are sent per batch.

- I need to process the messages in order. The messages are basically
providing the history of an item. I need to be able to track the history
accurately (i.e, if an event occurred 3 times, i need to accurately log the
dates of the first, 2nd, and 3rd time it occurred).

The approach I'm considering is:

- Creating a cassandra table which is ordered by the timestamp of the
messages.

- Once a batch of messages has arrived, writing them all to cassandra,
counting on them being ordered by the timestamp even if they are processed
out of order.

- Then iterating over the messages in the cassandra table, to process them
in order.

However, I'm concerned about Cassandra's eventual consistency. Could it be
that even though I wrote the messages, they are not there when I try to
read them (which would be almost immediately after they are written)?

Should I enforce consistency = ALL to make sure the messages will be
available immediately after being written?

Is there a better way to handle this thru either Kafka streams or Cassandra?

Mime
View raw message