incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Davis <jerdavis.cassan...@gmail.com>
Subject Re: Creating a Total Ordered Queue in Cassandra
Date Fri, 02 Apr 2010 05:04:58 GMT
Since twitter is everyone's favorite analogy:
It's like twitter, but faster and with bigger messages that I may need to go
back and replay in order to mine for more details at a later date.
Thus, I call it a queue, because the order of messages is important.. But
not anything like a message broker/pub-sub/topic/ etc...

-JD



On Thu, Apr 1, 2010 at 9:43 PM, Jeremy Davis
<jerdavis.cassandra@gmail.com>wrote:

>
> You are correct, it is not a queue in the classic sense... I'm storing the
> entire "conversation" with a client in perpetuity, and then playing it back
> in the order received.
>
> Rabbitmq/activemq etc all have about the same throughput 3-6K persistent
> messages/sec, and are not good for storing the conversation forever... Also
> I can easily scale cassandra past that message rate and not have to worry
> about which message broker/cluster I'm connecting to/has the
> conversation/etc.
>
>
>
>
> On Thu, Apr 1, 2010 at 7:02 PM, Keith Thornhill <keith@raptr.com> wrote:
>
>> you mention never deleting from the queue, so what purpose is this
>> serving? (if you don't pop off the front, is it really a queue?)
>>
>> seems if guaranteed order of messages is required, there are many
>> other projects which are focused towards that problem (rabbitmq,
>> kestrel, activemq, etc)
>>
>> or am i misunderstanding your needs here?
>>
>> -keith
>>
>> On Thu, Apr 1, 2010 at 6:32 PM, Jeremy Davis
>> <jerdavis.cassandra@gmail.com> wrote:
>> > I'm in the process of implementing a Totally Ordered Queue in Cassandra,
>> and
>> > wanted to bounce my ideas off the list and also see if there are any
>> other
>> > suggestions.
>> >
>> > I've come up with an external source of ID's that are always increasing
>> (but
>> > not monotonic), and I've also used external synchronization to ensure
>> only
>> > one writer to a given queue. And I handle de-duping in the app.
>> >
>> >
>> > My current solution is : (simplified)
>> >
>> > Use the "QueueId", to Key into a row of a CF.
>> > Then, every column in that CF corresponds to a new entry in the Queue,
>> with
>> > a custom Comparator to sort the columns by my external ID that is always
>> > increasing.
>> >
>> > Technically I never delete data from the Queue, and I just page through
>> it
>> > from a given ID using a SliceRange, etc.
>> >
>> > Obviously the problem being that the row needs to get compacted. so then
>> I
>> > started bucketizing with multiple rows for a given queue (for example
>> one
>> > per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...)
>> >
>> > Does this seem reasonable? It's solvable, but is starting to seem
>> > complicated to implement... It would be very easy if I didn't have to
>> have
>> > multiple buckets..
>> >
>> >
>> >
>> > My other thought is to store one entry per row, and perform
>> get_range_slices
>> > and specify a KeyRange, with the OrderPreservingPartitioner.
>> > But it isn't exactly clear to me what the Order of the keys are in this
>> > system, so I don't know how to construct my key and queries
>> appropriately...
>> > Is this Lexical String Order? Or?
>> >
>> > So for example.. Assuming my QueueId's are longs, and my ID's are also
>> > longs.. My key would be (in Java):
>> >
>> > long queueId;
>> > long msgId;
>> >
>> > key = "" + queueId + ":" + msgId;
>> >
>> > And if I wanted to do a query my key range might be from
>> > start = "" + queueId + ":0"
>> > end = "" + queueId + ":" + Long.MAX_VALUE;
>> >
>> > (Will I have to left pad the msgIds with 0's)?
>> >
>> > And is this going to be efficient if my msgId isn't monotonically
>> > increasing?
>> >
>> > Thanks,
>> > -JD
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>

Mime
View raw message