bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiannan Wang <jian...@yahoo-inc.com>
Subject Re: [Discussion] [Hedwig] Add queue semantic support for Hedwig
Date Sat, 23 Feb 2013 10:15:31 GMT
Hi Sijie,
   Thanks for well explaining on the difference between pub/sub model and queue model, I did
confuse on them when there is only one subscriber on topic, I just want to invoke queue semantic
to get around the problem :)

--------------------
two ideas could be proceed to resolve it (similar as what kafka did):
1) have a subscription option to indicate subscribe starting from the latest sequence id or
the oldest sequence id.
2) let subscriber managed its consumed ptr and passed the consumed ptr back when subscribe
to tell hub server where to start delivery. this subscriber could be a special subscriber
distinguished by a subscription option.

several benefits could be made by 2):
a) eliminate the storage and access of subscription metadata.
b) provided the mechanism to rewind the subscription back for replaying already consumed messages
again.
--------------------
I see the ConsumerConfig class in kafka's api but cannot find related option.
For idea 1), we also need to change current message garbage collection behavior in Hedwig:
for topic with no subscriber just keep the message with messageBound limit. I in favor of
this solution.
idea 2) is cool though it requires large changes compare to 1).

I see Flavio's reply to Yannick which suggests using ZooKeeper to coordinate the actions of
publisher and subscriber. But it's a client-side solution, I would prefer solution 1) in Sijie's
proposal which requires no special works in client side.

Thanks,
Jiannan


From: Sijie Guo <guosijie@gmail.com<mailto:guosijie@gmail.com>>
Reply-To: "bookkeeper-user@zookeeper.apache.org<mailto:bookkeeper-user@zookeeper.apache.org>"
<bookkeeper-user@zookeeper.apache.org<mailto:bookkeeper-user@zookeeper.apache.org>>
Date: Thursday, February 21, 2013 4:50 PM
To: "bookkeeper-dev@zookeeper.apache.org<mailto:bookkeeper-dev@zookeeper.apache.org>"
<bookkeeper-dev@zookeeper.apache.org<mailto:bookkeeper-dev@zookeeper.apache.org>>
Cc: "bookkeeper-user@zookeeper.apache.org<mailto:bookkeeper-user@zookeeper.apache.org>"
<bookkeeper-user@zookeeper.apache.org<mailto:bookkeeper-user@zookeeper.apache.org>>,
Hang Qi <hangqi@yahoo-inc.com<mailto:hangqi@yahoo-inc.com>>, Hongjian Chen <hongjian@yahoo-inc.com<mailto:hongjian@yahoo-inc.com>>,
Bizhu Qiu <qiubz@yahoo-inc.com<mailto:qiubz@yahoo-inc.com>>, Fangmin Lv <lvfm@yahoo-inc.com<mailto:lvfm@yahoo-inc.com>>,
Lin Shen <shenlin@yahoo-inc.com<mailto:shenlin@yahoo-inc.com>>
Subject: Re: [Discussion] [Hedwig] Add queue semantic support for Hedwig

Thanks Jiannan for raising the discussion of queue semantic. There was some other guys in
the mail list asked for queue semantic before.

Basically, topic (pub/sub) is quite different from queue in messaging concepts. In pub/sub
model, when a publisher publish a message, it goes to all the consumers (subscribers) who
are interested; while a queue model implements a load balancer semantic. A single message
would be consumed almost exactly by one consumer. It means that a queue has many consumers
with messages load balanced across the available consumers.

If the application requires all consumers seen same view of published messages, a topic is
better for it. If the application doesn't matter who would receive and consume the published
messages, a queue is better. But these two concepts become similar when there are only one
consumer. It might make you confused on using a queue or a topic.

for your case, it is still a pub/sub application. so your first question is how to handle
this case gracefully in a pub/sub model. two ideas could be proceed to resolve it (similar
as what kafka did):

1) have a subscription option to indicate subscribe starting from the latest sequence id or
the oldest sequence id.

2) let subscriber managed its consumed ptr and passed the consumed ptr back when subscribe
to tell hub server where to start delivery. this subscriber could be a special subscriber
distinguished by a subscription option.

several benefits could be made by 2):

a) eliminate the storage and access of subscription metadata.
b) provided the mechanism to rewind the subscription back for replaying already consumed messages
again.

for the garbage collection stuff you mentioned on how long to keep the messages, we already
have messageBound to limit the length of a topic. We don't need to worry about it.

for your second question, it might be nice to have the queue semantic in Hedwig, since JMS
implementation needs it. But implementing the queue semantic is totally a different story
than pub/sub.

-Sijie


On Wed, Feb 20, 2013 at 6:58 PM, Jiannan Wang <jiannan@yahoo-inc.com<mailto:jiannan@yahoo-inc.com>>
wrote:
Hi guys,
   Under current Hedwig semantic, a subscriber cannot aware of messages published before he
subscribes the topic. So in following example, subscriber A can only receives messages after
seqId 2.
---------------------------------
Topic T: msg1 msg2 msg3 msg4 ...
                     | <- subscriber A subscribe the topic
---------------------------------

   This semantic is very reasonable, but Hedwig client needs to handle this corner case: a
new topic is just to be created, and as topic is lazily created by the first request (generally
it's PUB or SUB), so the client side must coordinate between publisher and subscriber to make
sure the first SUB is handled before the first PUB at this very beginning status (consider
subscriber may have very bad network connection which causes SUB failed and user does not
want to miss any messages). In summary, it requires special works if there is a subscriber
would like to receive all the messages since topic is created, and I think this requirement
is very general.

   Handle this problem in client side is a choice, but I think maybe we can simply resolve
it  in server side if Hedwig can support queue semantic (so that we can also extend Hedwig
JMS provider to support JMS queue in BOOKKEEPER-312). And as I known, the major concern on
queue semantic is how long to keep the messages, however:
   1. It is user's responsibility to know about the feature and impact of queue semantic.
   2. On the other hand, we can add a parameter to limit the queue length.

   In a word, here are the two problem I would like to discuss:
   1. How to gracefully resolve the above issue in server side under current semantic.
   2. Whether or not to introduce queue semantic into Hedwig.

Thanks,
Jiannan


Mime
View raw message