activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Strachan" <>
Subject [IDEA] great aggregation without a database, or performing reliable map/reduce using Message Groups...
Date Sun, 13 Jan 2008 16:12:03 GMT
... I just wanted to explain an awesome idea that popped out of a bit
of brainstorming while sat at breakfast with Guillaume today.
Aggregating messages reliably, over long periods of time in a high
performance way is kinda sucky right now. Either you use batches like
the default Camel Aggregator

(which is a bit sucky) or you have to use a database (and then hit 2
phase commit type issues).

The problem basically is that you want a consumer to process an entire
group of messages, in order, in a single transaction to avoid having
to use persistence; if they fail the entire group of messages need to
be redelivered (maybe to another consumer) in full - otherwise you
have to use persistence to maintain partial state.

Also you only want a single consumer to get a single group of messages
at once as the consumer can only commit a single group of messages at
a time (it can't interleave them).

So its kinda like the consumer wants to do a selector that kinda says
'send me one complete message group including the last closing message
of the sequence, in order please'.

At first we pondered about selectors - then we had the idea of having
a kind of 'exclusive message group'; namely that using Message

we could maybe limit the consumer to only support one message group at
once until its closed, then it can consume another message group. This
in itself is a pretty good solution that could work today fairly
easily (we just need the chooser code associating a new message group
to a consumer to ignore consumers that already have a message group
associated with them if they have 'exclusive message group' enabled).

The main downside with this approach is that a single consumer can
then get locked by a single message group, that could span hours, days
or weeks - unable to process any more messages until finally the
message group is completed.

So what would be really nice would be is if we supported the
JMSXGroupSeq header (for sequence numbers within a message group) and
made it possible to not dispatch any messages within a message group,
until the sequence is complete; so they'd stay around in the broker
until the sequence can be processed, in one brief amount of time by a
single consumer. Also we should support reordering of messages within
the sequence as well if the messages get out of order. Then folks
could do long term aggregation using purely ActiveMQ in a nice high
performance way!

Another added benefit would be folks could do a totally asynchronous,
loosely coupled and reliable Map/Reduce pattern using purely ActiveMQ.
e.g. we can split a single message using the splitter

using the message ID as the JMSXGroupID and for each child message we
assign a JMSXGroupSeq until the last message we close the sequence.
Then each message can be processed by any consumer in a grid, sending
replies back to another queue using the same JMSXGroupID and
JMSXGroupSeq. Then when all the messages are received and the sequence
is complete, the entire sequence of response messages is sent to a
single consumer; who then commits its transaction when the last
message in the sequence is processed. All without any explicit
persistence - yet the whole thing would be totally transactional and
reliable! Not bad eh?

As a first stab, it'd be nice to support the 'exclusive message group'
idea which seems pretty easy to do; as then at least we'd have an
awesome solution for Map/Reduce scenarios which complete within a
relatively short amount of time; we'd only need the more advanced
'dispatch when the message group is complete' option for dealing with
very long running Map/Reduce problems - which are probably fairly


Open Source Integration

View raw message