From Sijie Guo <>
Subject Re: [Discussion] [Hedwig] Add queue semantic support for Hedwig
Date Thu, 21 Feb 2013 08:50:57 GMT
Thanks Jiannan for raising the discussion of queue semantic. There was some
other guys in the mail list asked for queue semantic before.

Basically, topic (pub/sub) is quite different from queue in messaging
concepts. In pub/sub model, when a publisher publish a message, it goes to
all the consumers (subscribers) who are interested; while a queue model
implements a load balancer semantic. A single message would be consumed
almost exactly by one consumer. It means that a queue has many consumers
with messages load balanced across the available consumers.

If the application requires all consumers seen same view of published
messages, a topic is better for it. If the application doesn't matter who
would receive and consume the published messages, a queue is better. But
these two concepts become similar when there are only one consumer. It
might make you confused on using a queue or a topic.

for your case, it is still a pub/sub application. so your first question is
how to handle this case gracefully in a pub/sub model. two ideas could be
proceed to resolve it (similar as what kafka did):

1) have a subscription option to indicate subscribe starting from the
latest sequence id or the oldest sequence id.

2) let subscriber managed its consumed ptr and passed the consumed ptr back
when subscribe to tell hub server where to start delivery. this subscriber
could be a special subscriber distinguished by a subscription option.

several benefits could be made by 2):

a) eliminate the storage and access of subscription metadata.
b) provided the mechanism to rewind the subscription back for replaying
already consumed messages again.

for the garbage collection stuff you mentioned on how long to keep the
messages, we already have messageBound to limit the length of a topic. We
don't need to worry about it.

for your second question, it might be nice to have the queue semantic in
Hedwig, since JMS implementation needs it. But implementing the queue
semantic is totally a different story than pub/sub.


On Wed, Feb 20, 2013 at 6:58 PM, Jiannan Wang <> wrote:

> Hi guys,
>    Under current Hedwig semantic, a subscriber cannot aware of messages
> published before he subscribes the topic. So in following example,
> subscriber A can only receives messages after seqId 2.
> ---------------------------------
> Topic T: msg1 msg2 msg3 msg4 ...
>                      | <- subscriber A subscribe the topic
> ---------------------------------
>    This semantic is very reasonable, but Hedwig client needs to handle
> this corner case: a new topic is just to be created, and as topic is lazily
> created by the first request (generally it's PUB or SUB), so the client
> side must coordinate between publisher and subscriber to make sure the
> first SUB is handled before the first PUB at this very beginning status
> (consider subscriber may have very bad network connection which causes SUB
> failed and user does not want to miss any messages). In summary, it
> requires special works if there is a subscriber would like to receive all
> the messages since topic is created, and I think this requirement is very
> general.
>    Handle this problem in client side is a choice, but I think maybe we
> can simply resolve it  in server side if Hedwig can support queue semantic
> (so that we can also extend Hedwig JMS provider to support JMS queue in
> BOOKKEEPER-312). And as I known, the major concern on queue semantic is how
> long to keep the messages, however:
>    1. It is user's responsibility to know about the feature and impact of
> queue semantic.
>    2. On the other hand, we can add a parameter to limit the queue length.
>    In a word, here are the two problem I would like to discuss:
>    1. How to gracefully resolve the above issue in server side under
> current semantic.
>    2. Whether or not to introduce queue semantic into Hedwig.
> Thanks,
> Jiannan

