hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject Re: Question on production readiness, deployment, data of BookKeeper / Hedwig
Date Thu, 07 Oct 2010 16:15:11 GMT
  hi amit,

sorry for the late response. this week has been crunch time for a lot of 
different things.

here are your answers:

production

1. it is still in prototype phase. we are evaluating different aspects, 
but there is still some work to do to make it production ready. we also 
need to get an engineering team to signup to stand behind it.

2. it's a generic pub/sub message bus. in some sense it is really a 
datacenter solution with extensions for multi-data center operation, so 
it is perfectly reasonable to use it in a single datacenter setting.

3. yeah, we have removed the hw.bash script. it had some hardcoded 
assumptions and was a swiss army knife on steroids. he have been 
breaking it up into simpler scripts.

4. session expiry really represents a fundamental connectivity problem, 
so both bk and hedwig restart the component that gets the expired 
session errror.

data

1. yes.

2. once all subscribers have consumed a message there is a background 
process that cleans it up.

3. yes there is a replication factor and we ensure replication on writes 
and there is a recovery tool to recover bookies that fail. we don't have 
to worry about conflicts because there is only a single writer for a 
give ledger. because of this we do not need to do quorum reads.

documentation

yes, this is something we need to work on. i'll see if i can push out 
some of our hello world applications. we'd also like to put a JMS API on 
top so that the API is more familiar (and documented :). i don't want to 
delay the answers to your other questions, so let me answer that 
HedwigSubscriber is the class for clients. the other classes are 
internal. (for cross data center hubs use a special kind of 
subscriptions to do cross data center updates.)

ben

On 10/05/2010 10:32 PM, amit jaiswal wrote:
> Hi,
>
> In Hedwig talk (http://vimeo.com/13282102), it was mentioned that the primary
> use case for Hedwig comes from the distributed key-value store PNUTS in Yahoo!,
> but also said that the work is new.
>
> Could you please about the following:
>
> Production readiness / Deployment
> 1. What is the production readiness of Hedwig / BookKeeper. Is it being used
> anywhere (like in PNUTS)?
> 2. Is Hedwig designed to use as a generic message bus or only for
> multi-datacenter operations?
> 3. Hedwig installation and deployment is done through a script hw.bash, but that
> is difficult to use especially in a production environment. Are there any other
> packages available that can simplify the deployment of hedwig.
> 4. How does BK/Hedwig handle zookeeper session expiry?
>
> Data Deletion, Handling data loss, Quorum
> 1. Does BookKeeper support deletion of old log entries which have been consumed.
> 2. How does Hedwig handles the case when all subscribers have consumed all the
> messages. In the talk, it was said that a subscriber can come back after hours,
> days or weeks. Is there any data retention / expiration policy for the data that
> is published?
> 3. How does Hedwig handles data loss? There is a replication factor, and a write
> operation must be accepted by majority of the bookies, but how data conflicts
> are handled? Is there any possibility of data conflict at all? Is the
> replication only for recovery? When the hub is reading data from bookies, does
> it reads from all the bookies to satisfy quorum read?
>
> Code
> What is the difference between PubSubServer, HedwigSubscriber,
> HedwigHubSubscriber. Is there any HelloWorld program that simply illustrates how
> to instantiate a hedwig client, and publish/consume messages. (HedwigBenchmark
> class is helpful, but was looking something like API documentation).
>
> -regards
> Amit


Mime
View raw message