I'm in the process of implementing a Totally Ordered Queue in Cassandra, and wanted to bounce my ideas off the list and also see if there are any other suggestions.
I've come up with an external source of ID's that are always increasing (but not monotonic), and I've also used external synchronization to ensure only one writer to a given queue. And I handle de-duping in the app.
My current solution is : (simplified)
Use the "QueueId", to Key into a row of a CF.
Then, every column in that CF corresponds to a new entry in the Queue, with a custom Comparator to sort the columns by my external ID that is always increasing.
Technically I never delete data from the Queue, and I just page through it from a given ID using a SliceRange, etc.
Obviously the problem being that the row needs to get compacted. so then I started bucketizing with multiple rows for a given queue (for example one per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...)
Does this seem reasonable? It's solvable, but is starting to seem complicated to implement... It would be very easy if I didn't have to have multiple buckets..
My other thought is to store one entry per row, and perform get_range_slices and specify a KeyRange, with the OrderPreservingPartitioner.
But it isn't exactly clear to me what the Order of the keys are in this system, so I don't know how to construct my key and queries appropriately... Is this Lexical String Order? Or?
So for example.. Assuming my QueueId's are longs, and my ID's are also longs.. My key would be (in Java):
key = "" + queueId + ":" + msgId;
And if I wanted to do a query my key range might be from
start = "" + queueId + ":0"
end = "" + queueId + ":" + Long.MAX_VALUE;
(Will I have to left pad the msgIds with 0's)?
And is this going to be efficient if my msgId isn't monotonically increasing?