bookkeeper-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From iv...@apache.org
Subject svn commit: r1644097 [2/2] - /bookkeeper/site/trunk/content/docs/master/
Date Tue, 09 Dec 2014 16:00:53 GMT
Added: bookkeeper/site/trunk/content/docs/master/hedwigConsole.textile
URL: http://svn.apache.org/viewvc/bookkeeper/site/trunk/content/docs/master/hedwigConsole.textile?rev=1644097&view=auto
==============================================================================
--- bookkeeper/site/trunk/content/docs/master/hedwigConsole.textile (added)
+++ bookkeeper/site/trunk/content/docs/master/hedwigConsole.textile Tue Dec  9 16:00:52 2014
@@ -0,0 +1,187 @@
+Title:        Hedwig Console
+Notice: Licensed under the Apache License, Version 2.0 (the "License");
+        you may not use this file except in compliance with the License. You may
+        obtain a copy of the License at "http://www.apache.org/licenses/LICENSE-2.0":http://www.apache.org/licenses/LICENSE-2.0.
+        .
+        .        
+        Unless required by applicable law or agreed to in writing,
+        software distributed under the License is distributed on an "AS IS"
+        BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+        implied. See the License for the specific language governing permissions
+        and limitations under the License.
+        .
+        .
+
+h1. Hedwig Console
+
+Apache Hedwig provides a console client, which allows users and administrators to interact with a hedwig cluster. 
+
+h2. Connecting to hedwig cluster
+
+Hedwig console client is shipped with hedwig server package.
+
+p. To start the console client:
+
+ @hedwig-server/bin/hedwig console@
+
+p. By default, the console client connects to hub server on localhost. If you want the console client to connect to a different hub server, you can override following environment variables.
+
+| @HEDWIG_CONSOLE_SERVER_CONF@ | Path of a hub server configuration file. Override to make hedwig console client connect to correct zookeeper cluster. |
+| @HEDWIG_CONSOLE_CLIENT_CONF@ | Path of a hedwig client configuration file. Override to make hedwig console client communicate with correct hub servers. |
+
+p. Once connected, you should see something like:
+
+<pre>
+Connecting to zookeeper/bookkeeper using HedwigAdmin
+
+Connecting to default hub server localhost/127.0.0.1:4080
+Welcome to Hedwig!
+JLine support is enabled
+JLine history support is enabled
+[hedwig: (standalone) 16] 
+</pre>
+
+p. From the shell, type __help__ to get a list of commands that can be executed from the client:
+
+<pre>
+[hedwig: (standalone) 16] help
+HedwigConsole [options] [command] [args]
+
+Available commands:
+        pub
+        sub
+        closesub
+        unsub
+        rmsub
+        consume
+        consumeto
+        pubsub
+        show
+        describe
+        readtopic
+        set
+        history
+        redo
+        help
+        quit
+        exit
+
+Finished 0.0020 s.
+</pre>
+
+p. If you want to know detail usage for each command, type __help {command}__ in the shell. For example:
+
+<pre>
+[hedwig: (standalone) 17] help pub
+pub: Publish a message to a topic in Hedwig
+usage: pub {topic} {message}
+
+  {topic}   : topic name.
+              any printable string without spaces.
+  {message} : message body.
+              remaining arguments are used as message body to publish.
+
+Finished 0.0 s.
+</pre>
+
+h2. Commands
+
+All the available commands provided in hedwig console could be categorized into three groups. They are __interactive commands__, __admin commands__, __utility commands__.
+
+h3. Interactive Commands
+
+p. Interactive commands are used by users to communicate with a hedwig cluster. They are __pub__, __sub__, __closesub__, __unsub__, __consume__ and __consumeto__.
+
+p. These commands are quite simple and have same semantics as the API provided in hedwig client.
+
+h3.  Admin Commands
+
+p. Admin commands are used by administrators to operate or debug a hedwig cluster. They are __show__, __describe__, __pubsub__ and __readtopic__.
+
+p. __show__ is used to list all available hub servers or topics in the cluster.
+
+p. You could use __show__ to list hub servers to know how many hub servers are alive in the cluster.
+
+<pre>
+[hedwig: (standalone) 27] show hubs
+Available Hub Servers:
+        192.168.1.102:4080:9876 :       0
+Finished 0.0040 s.
+</pre>
+
+p. Also, you could use __show__ to list all topics. If you have a lot of topics on the clusters, this command will take a long time to run.
+
+<pre>
+[hedwig: (standalone) 28] show topics
+Topic List:
+[mytopic]
+Finished 0.0020 s.
+</pre>
+
+p. To see the details of a topic, run __describe__. This shows the metadata of a topic, including topic owner, persistence info, subscriptions info.
+
+<pre>
+[hedwig: (standalone) 43] describe topic mytopic
+===== Topic Information : mytopic =====
+
+Owner : 192.168.1.102:4080:9876
+
+>>> Persistence Info <<<
+Ledger 3 [ 1 ~ 9 ]
+
+>>> Subscription Info <<<
+Subscriber mysub : consumeSeqId: local:0
+
+Finished 0.011 s.
+</pre>
+
+p. When you are run the __describe__ command, you should keep in mind that __describe__ command reads the metadata from __ZooKeeper__ directly, so the subscription info might not be completely up to date due to the fact that hub servers update the subscription metadata lazily.
+
+p. The __readtopic__ command is useful to see which messages have not been consumed by the client.
+
+<pre>
+[hedwig: (standalone) 46] readtopic mytopic
+
+>>>>> Ledger 3 [ 1 ~ 9] <<<<<
+
+---------- MSGID=LOCAL(1) ----------
+MsgId:     LOCAL(1)
+SrcRegion: standalone
+Message:
+
+hello
+
+---------- MSGID=LOCAL(2) ----------
+MsgId:     LOCAL(2)
+SrcRegion: standalone
+Message:
+
+hello 2
+
+---------- MSGID=LOCAL(3) ----------
+MsgId:     LOCAL(3)
+SrcRegion: standalone
+Message:
+
+hello 3
+
+...
+</pre>
+
+p. __pubsub__ is another useful command for administrators. It can be used to test availability and functionality of a cluster. It generates a temporary subscriber id with the current timestamp, subscribes to the given topic using generated subscriber id, publishes a message to given topic and testes whether the subscriber received the message.
+
+<pre>
+[hedwig: (standalone) 48] pubsub testtopic testsub- 10 test message for availability
+Starting PUBSUB test ...
+Sub topic testtopic, subscriber id testsub--1338126964504
+Pub topic testtopic : test message for availability-1338126964504
+Received message : test message for availability-1338126964504
+PUBSUB SUCCESS. TIME: 377 MS
+Finished 0.388 s.
+</pre>
+
+h3. Utility Commands
+
+p. Utility Commands are __help__, __history__, __redo__, __quit__ and __exit__.
+
+p. __quit__ and __exit__ are used to exit console, while __history__ and __redo__ are used to manage the history of commands executed in the shell.

Added: bookkeeper/site/trunk/content/docs/master/hedwigDesign.textile
URL: http://svn.apache.org/viewvc/bookkeeper/site/trunk/content/docs/master/hedwigDesign.textile?rev=1644097&view=auto
==============================================================================
--- bookkeeper/site/trunk/content/docs/master/hedwigDesign.textile (added)
+++ bookkeeper/site/trunk/content/docs/master/hedwigDesign.textile Tue Dec  9 16:00:52 2014
@@ -0,0 +1,72 @@
+Notice: Licensed under the Apache License, Version 2.0 (the "License");
+        you may not use this file except in compliance with the License. You may
+        obtain a copy of the License at "http://www.apache.org/licenses/LICENSE-2.0":http://www.apache.org/licenses/LICENSE-2.0.
+        .        
+        Unless required by applicable law or agreed to in writing,
+        software distributed under the License is distributed on an "AS IS"
+        BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+        implied. See the License for the specific language governing permissions
+        and limitations under the License.
+        .
+
+h1. Style
+
+We have provided an Eclipse Formatter file @formatter.xml@ with all the formatting conventions currently used in the project. Highlights include no tabs, 4-space indentation, and 120-char width. Please respect this so as to reduce the amount of formatting-related noise produced in commits.
+
+h1. Static Analysis
+
+We would like to use static analysis tools PMD and FindBugs to maintain code quality. However, we have not yet arrived at a consensus on what rules to adhere to, and what to ignore.
+
+h1. Netty Notes
+
+The asynchronous network IO infrastructure that Hedwig uses is "Netty":http://www.jboss.org/netty. Here are some notes on Netty's concurrency architecture and its filter pipeline design.
+
+h2. Concurrency Architecture
+
+After calling @ServerBootstrap.bind()@, Netty starts a boss thread (@NioServerSocketPipelineSink.Boss@) that just accepts new connections and registers them with one of the workers from the @NioWorker@ pool in round-robin fashion (pool size defaults to CPU count). Each worker runs its own select loop over just the set of keys that have been registered with it. Workers start lazily on demand and run only so long as there are interested fd's/keys. All selected events are handled in the same thread and sent up the pipeline attached to the channel (this association is established by the boss as soon as a new connection is accepted).
+
+All workers, and the boss, run via the executor thread pool; hence, the executor must support at least two simultaneous threads.
+
+h2. Handler Pipeline
+
+A pipeline implements the intercepting filter pattern. A pipeline is a sequence of handlers. Whenever a packet is read from the wire, it travels up the stream, stopping at each handler that can handle upstream events. Vice-versa for writes. Between each filter, control flows back through the centralized pipeline, and a linked list of contexts keeps track of where we are in the pipeline (one context object per handler).
+
+
+h1. Pseudocode
+
+This summarizes the control flow through the system.
+
+h2. publish
+
+Need to document
+
+h2. subscribe
+
+Need to document
+
+h1. ReadAhead Cache
+
+The delivery manager class is responsible for pushing published messages from the hubs to the subscribers. The most common case is that all subscribers are connected and either caught up, or close to the tail end of the topic. In this case, we don't want the delivery manager to be polling bookkeeper for any newly arrived messages on the topic; new messages should just be pushed to the delivery manager. However, there is also the uncommon case when a subscriber is behind, and messages must be pulled from Bookkeeper.
+
+Since all publishes go through the hub, it is possible to cache the recently published messages in the hub, and then the delivery manager won't have to make the trip to bookkeeper to get the messages but instead get them from local process memory.
+
+These ideas of push, pull, and caching are unified in the following way: - A hub has a cache of messages
+
+* When the delivery manager wants to deliver a message, it asks the cache for it. There are 3 cases:
+* The message is available in the cache, in which case it is given to the delivery manager
+* The message is not present in the cache and the seq-id of the message is beyond the last message published on that topic (this happens if the subscriber is totally caught up for that topic). In this case, a stub is put in the cache in order to notify the delivery manager when that message does happen to be published.
+* The message is not in the cache but has been published to the topic. In this case, a stub is put in the cache, and a read is issued to bookkeeper.
+* Whenever a message is published, it is cached. If there is a stub already in the cache for that message, the delivery manager is notified.
+* Whenever a message is read from bookkeeper, it is cached. There must be a stub for that message (since reads to bookkeeper are issued only after putting a stub), so the delivery manager is notified.
+* The cache does readahead, i.e., if a message requested by the delivery manager is not in the cache, a stub is established not only for that message, but also for the next n messages where n is configurable (default 10). On a cache hit, we look ahead n/2 messages, and if that message is not present, we establish another n/2 stubs. In short, we always ensure that the next n stubs are always established.
+* Over time, the cache will grow in size. There are 2 pruning mechanisms:
+* Once all subscribers have consumed up to a particular seq-id, they notify the cache, and all messages up to that seq-id are pruned from the cache.
+* If the above pruning is not working (e.g., because some subscribers are down), the cache will eventually hit its size limit which is configurable
+ (default, half of maximum jvm heap size). At this point, messages are just pruned in FIFO order. We use the size of the blobs in the message for estimating the cache size. The assumption is that that size will dominate over fixed, object-level size overheads.
+* Stubs are not purged because according to the above simplification, they are of 0 size.
+
+h1. Scalability Bottlenecks Down the Road
+
+* Currently each topic subscription is served on a different channel. The number of channels will become a bottleneck at higher channels. We should switch to an architecture, where multiple topic subscriptions between the same client, hub pair should be served on the same channel. We can have commands to start, stop subscriptions sent all the way to the server (right now these are local).
+* Publishes for a topic are serialized through a hub, to get ordering guarantees. Currently, all subscriptions to that topic are served from the same hub. If we start having large number of subscribers to heavy-volume topics, the outbound bandwidth at the hub, or the CPU at that hub might become the bottleneck. In that case, we can setup other regions through which the messages are routed (this hierarchical scheme) reduces bandwidth requirements at any single node. It should be possible to do this entirely through configuration.
+

Added: bookkeeper/site/trunk/content/docs/master/hedwigJMX.textile
URL: http://svn.apache.org/viewvc/bookkeeper/site/trunk/content/docs/master/hedwigJMX.textile?rev=1644097&view=auto
==============================================================================
--- bookkeeper/site/trunk/content/docs/master/hedwigJMX.textile (added)
+++ bookkeeper/site/trunk/content/docs/master/hedwigJMX.textile Tue Dec  9 16:00:52 2014
@@ -0,0 +1,32 @@
+Title:        Hedwig JMX
+Notice: Licensed under the Apache License, Version 2.0 (the "License");
+        you may not use this file except in compliance with the License. You may
+        obtain a copy of the License at "http://www.apache.org/licenses/LICENSE-2.0":http://www.apache.org/licenses/LICENSE-2.0.
+        .
+        .        
+        Unless required by applicable law or agreed to in writing,
+        software distributed under the License is distributed on an "AS IS"
+        BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+        implied. See the License for the specific language governing permissions
+        and limitations under the License.
+        .
+        .
+
+h1. JMX
+
+Apache Hedwig has extensive support for JMX, which allows viewing and managing a hedwig cluster.
+
+This document assumes that you have basic knowledge of JMX. See "Sun JMX Technology":http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/ page to get started with JMX.
+
+See the "JMX Management Guide":http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html for details on setting up local and remote management of VM instances. By default the included __hedwig__ script supports only local management - review the linked document to enable support for remote management (beyond the scope of this document).
+
+__Hub Server__ is a JMX manageable server, which registers the proper MBeans during initialization to support JMX monitoring and management of the instance.
+
+h1. Hub Server MBean Reference
+
+This table details JMX for a hub server.
+
+| _.MBean | _.MBean Object Name | _.Description |
+| PubSubServer | PubSubServer | Represents a hub server. It is the root MBean for hub server, which includes statistics for a hub server. E.g. number packets sent/received/redirected, and statistics for pub/sub/unsub/consume operations. |
+| NettyHandlers | NettyHandler | Provide statistics for netty handlers. Currently it just returns number of subscription channels established to a hub server. |
+| ReadAheadCache | ReadAheadCache | Provide read ahead cache statistics. |

Added: bookkeeper/site/trunk/content/docs/master/hedwigMessageFilter.textile
URL: http://svn.apache.org/viewvc/bookkeeper/site/trunk/content/docs/master/hedwigMessageFilter.textile?rev=1644097&view=auto
==============================================================================
--- bookkeeper/site/trunk/content/docs/master/hedwigMessageFilter.textile (added)
+++ bookkeeper/site/trunk/content/docs/master/hedwigMessageFilter.textile Tue Dec  9 16:00:52 2014
@@ -0,0 +1,76 @@
+Title:        Hedwig Message Filter
+Notice: Licensed under the Apache License, Version 2.0 (the "License");
+        you may not use this file except in compliance with the License. You may
+        obtain a copy of the License at "http://www.apache.org/licenses/LICENSE-2.0":http://www.apache.org/licenses/LICENSE-2.0.
+        .
+        .
+        Unless required by applicable law or agreed to in writing,
+        software distributed under the License is distributed on an "AS IS"
+        BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+        implied. See the License for the specific language governing permissions
+        and limitations under the License.
+        .
+        .
+
+h1. Message Filter
+
+Apache Hedwig provides an efficient mechanism for supporting application-defined __message filtering__.
+
+h2. Message
+
+Most message-oriented middleware (MOM) products treat messages as lightweight entities that consist of a header and a payload. The header contains fields used for message routing and identification; the payload contains the application data being sent.
+
+Hedwig messages follow a similar template, being composed of following parts:
+
+* @Header@ - All messages support both system defined fields and application defined property values. Properties provide an efficient mechanism for supporting application-defined message filtering.
+* @Body@ - Hedwig considers the message body as a opaque binary blob.
+* @SrcRegion@ - Indicates where the message comes from.
+* @MessageSeqId@ - The unique message sequence id assigned by Hedwig.
+
+h3. Message Header Properties
+
+A __Message__ object contains a built-in facility for supporting application-defined property values. In effect, this provides a mechanism for adding application-specific header fields to a message.
+
+By using properties and  __message filters__, an application can have Hedwig select, or filter, messages on its behalf using application-specific criteria.
+
+Property names must be a __String__ and must not be null, while property values are binary blobs. The flexibility of binary blobs allows applications to define their own serialize/deserialize functions, allowing structured data to be stored in the message header.
+
+h2. Message Filter
+
+A __Message Filter__ allows an application to specify, via header properties, the messages it is interested in. Only messages which pass validation of a __Message Filter__, specified by a subscriber, are be delivered to the subscriber.
+
+A message filter could be run either on the __server side__ or on the __client side__. For both __server side__ and __client side__, a __Message Filter__ implementation needs to implement the following two interfaces:
+
+* @setSubscriptionPreferences(topic, subscriberId, preferences)@: The __subscription preferences__ of the subscriber will be passed to message filter when it was attached to its subscription either on the server-side or on the client-side.
+* @testMessage(message)@: Used to test whether a particular message passes the filter or not.
+
+The __subscription preferences__ are used to specify the messages that the user is interested in. The __message filter__ uses the __subscription preferences__ to decide which messages are passed to the user.
+
+Take a book store(using topic __BookStore__) as an example:
+
+# User A may only care about History books. He subscribes to __BookStore__ with his custom preferences : type="History".
+# User B may only care about Romance books. He subscribes to __BookStore__ with his custom preferences : type="Romance".
+# A new book arrives at the book store; a message is sent to __BookStore__ with type="History" in its header
+# The message is then delivered to __BookStore__'s subscribers.
+# Subscriber A filters the message by checking messages' header to accept those messages whose type is "History".
+# Subscriber B filters out the message, as the type does not match its preferences.
+
+h3. Client Message Filter.
+
+A __ClientMessageFilter__ runs on the client side. Each subscriber can write its own filter and pass it as a parameter when starting delivery ( __startDelivery(topic, subscriberId, messageHandler, messageFilter)__ ).
+
+h3. Server Message Filter.
+
+A __ServerMessageFilter__ runs on the server side (a hub server). A hub server instantiates a server message filter, by means of reflection, using the message filter class specified in the subscription preferences which are provided by the subscriber. Since __ServerMessageFilter__s run on the hub server, all filtered-out messages are never delivered to client, reducing unnecessary network traffic. Hedwig uses a implementation of __ServerMessageFilter__ to filter unnecessary message deliveries between regions.
+
+Since hub servers use reflection to instantiate a __ServerMessageFilter__, an implementation of __ServerMessageFilter__ needs to implement two additional methods:
+
+* @initialize(conf)@: Initialize the message filter before filtering messages.
+* @uninitialize()@: Uninitialize the message filter to release resources used by the message filter.
+
+For the hub server to load the message filter, the implementation class must be in the server's classpath at startup.
+
+h3. Which message filter should be used?
+
+It depends on application requirements. Using a __ServerMessageFilter__ will reduce network traffic by filtering unnecessary messages, but it would compete for resources on the hub server(CPU, memory, etc). Conversely, __ClientMessageFilter__s have the advantage of inducing no extra load on the hub server, but at the price of higher network utilization. A filter can be installed both at the server side and on the client; Hedwig does not restrict this.
+

Added: bookkeeper/site/trunk/content/docs/master/hedwigMetadata.textile
URL: http://svn.apache.org/viewvc/bookkeeper/site/trunk/content/docs/master/hedwigMetadata.textile?rev=1644097&view=auto
==============================================================================
--- bookkeeper/site/trunk/content/docs/master/hedwigMetadata.textile (added)
+++ bookkeeper/site/trunk/content/docs/master/hedwigMetadata.textile Tue Dec  9 16:00:52 2014
@@ -0,0 +1,123 @@
+Title:        Hedwig Metadata Management
+Notice: Licensed under the Apache License, Version 2.0 (the "License");
+        you may not use this file except in compliance with the License. You may
+        obtain a copy of the License at "http://www.apache.org/licenses/LICENSE-2.0":http://www.apache.org/licenses/LICENSE-2.0.
+        .
+        .
+        Unless required by applicable law or agreed to in writing,
+        software distributed under the License is distributed on an "AS IS"
+        BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+        implied. See the License for the specific language governing permissions
+        and limitations under the License.
+        .
+        .
+
+h1. Metadata Management
+
+There are two classes of metadata that need to be managed in Hedwig: one is the __list of available hubs__, which is used to track server availability (ZooKeeper is designed naturally for this); while the other is for data structures to track __topic states__ and __subscription states__. This second class can be handled by any key/value store which provides ah __CAS (Compare And Set)__ operation. The metadata in this class are:
+
+* @Topic Ownership@: tracks which hub server is assigned to serve requests for a specific topic.
+* @Topic Persistence Info@: records what __bookkeeper ledgers__ are used to store messages for a specific topic and their message id ranges.
+* @Subscription Data@: records the preferences and subscription state for a specific subscription (topic, subscriber).
+
+Each kind of metadata is handled by a specific metadata manager. They are __TopicOwnershipManager__, __TopicPersistenceManager__ and __SubscriptionDataManager__.
+
+h2. Topic Ownership Management
+
+There are two ways to management topic ownership. One is leveraging ZooKeeper's ephemeral znodes to record the topic's owner info as a child ephemeral znode under its topic znode. When a hub server, owning a specific topic, crashes, the ephemeral znode which signifies topic ownership will be deleted due to the loss of the zookeeper session. Other hubs can then be assigned the ownership of the topic. The other one is to leverage the __CAS__ operation provided by key/value stores to do leader election. __CAS__ doesn't require the underlying key/value store to provide functionality similar to ZooKeeper's ephemeral nodes. With __CAS__ it is possible to guarantee that only one hub server gains the ownership for a specific topic, which is more scalable and generic solution.
+
+The implementation of a __TopicOwnershipManager__ is required to implement following methods:
+
+<pre><code>
+
+public void readOwnerInfo(ByteString topic, Callback<Versioned<HubInfo>> callback, Object ctx);
+
+public void writeOwnerInfo(ByteString topic, HubInfo owner, Version version,
+                           Callback<Version> callback, Object ctx);
+
+public void deleteOwnerInfo(ByteString topic, Version version,
+                            Callback<Void> callback, Object ctx);
+
+</code></pre>
+
+* @readOwnerInfo@: Read the owner info from the underlying key/value store. The implementation should take the responsibility of deserializing the metadata into a __HubInfo__ object identifying a hub server. Also, its current __version__ needs to be returned for future updates. If there is no owner info found for a topic, null value is returned.
+
+* @writeOwnerInfo@: Write the owner info into the underlying key/value store with the given __version__. If the current __version__ in underlying key/value store doesn't equal to the provided __version__, the write should be rejected with __BadVersionException__. The new __version__ should be returned for a successful write. __NoTopicOwnerInfoException__ is returned if no owner info found for a topic.
+
+* @deleteOwnerInfo@: Delete the owner info from key/value store with the given __version__. The owner info should be removed if the current __version__ in key/value store is equal to the provided __version__. Otherwise, the deletion should be rejected with __BadVersionException__. __NoTopicOwnerInfoException__ is returned if no owner info is found for the topic.
+
+h2. Topic Persistence Info Management
+
+Similar as __TopicOwnershipManager__, an implementation of __TopicPersistenceManager__ is required to implement READ/WRITE/DELETE interfaces as below:
+
+<pre><code>
+public void readTopicPersistenceInfo(ByteString topic,
+                                     Callback<Versioned<LedgerRanges>> callback, Object ctx);
+
+public void writeTopicPersistenceInfo(ByteString topic, LedgerRanges ranges, Version version,
+                                      Callback<Version> callback, Object ctx);
+
+public void deleteTopicPersistenceInfo(ByteString topic, Version version,
+                                       Callback<Void> callback, Object ctx);
+</code></pre>
+
+* @readTopicPersistenceInfo@: Read the persistence info from the underlying key/value store. The implementation should take the responsibility of deserializing the metadata into a __LedgerRanges__ object includes the ledgers used to store messages. Also, its current __version__ needs to be returned for future updates. If there is no persistence info found for a topic, a null value is returned.
+
+* @writeTopicPersistenceInfo@: Write the persistence info into the underlying key/value store with the given __version__. If the current __version__ in the underlying key/value store doesn't equal the provided __version__, the write should be rejected with __BadVersionException__. The new __version__ should be returned on a successful write. __NoTopicPersistenceInfoException__ is returned if no persistence info is found for a topic.
+
+* @deleteTopicPersistenceInfo@: Delete the persistence info from the key/value store with the given __version__. The owner info should be removed if the current __version__ in the key/value store equals the provided __version__. Otherwise, the deletion should be rejected with __BadVersionException__. __NoTopicPersistenceInfoException__ is returned if no persistence info is found for a topic.
+
+h2. Subscription Data Management
+
+__SubscriptionDataManager__ has similar READ/CREATE/WRITE/DELETE interfaces as other managers. Besides that, the implementation needs to implement __READ SUBSCRIPTIONS__ interface, which is to fetch all the subscriptions for a given topic.
+
+<pre><code>
+public void createSubscriptionData(ByteString topic, ByteString subscriberId, SubscriptionData data,
+                                   Callback<Version> callback, Object ctx);
+
+public boolean isPartialUpdateSupported();
+
+public void updateSubscriptionData(ByteString topic, ByteString subscriberId, SubscriptionData dataToUpdate, 
+                                   Version version, Callback<Version> callback, Object ctx);
+
+public void replaceSubscriptionData(ByteString topic, ByteString subscriberId, SubscriptionData dataToReplace,
+                                    Version version, Callback<Version> callback, Object ctx);
+
+public void deleteSubscriptionData(ByteString topic, ByteString subscriberId, Version version,
+                                   Callback<Void> callback, Object ctx);
+
+public void readSubscriptionData(ByteString topic, ByteString subscriberId,
+                                 Callback<Versioned<SubscriptionData>> callback, Object ctx);
+
+public void readSubscriptions(ByteString topic, Callback<Map<ByteString, Versioned<SubscriptionData>>> cb,
+                              Object ctx);
+</code></pre>
+
+h3. Create/Update Subscriptions
+
+The metadata for a subscription includes two parts, one is preferences and the other one is subscription state. __SubscriptionPreferences__ tracks all the preferences for a subscriber (etc. Application could store its customized preferences for message filtering), while __SubscriptionState__ is used internally to track the message consumption state for a given subscriber. These two kinds of metadata are quite different: __SubscriptionPreferences__ is not updated
+frequently while __SubscriptionState__ is be updated frequently when messages are consumed. If the underlying key/value store supports independent field update for a given key (subscription), __SubscriptionPreferences__ and __SubscriptionState__ could be stored as two different fields for a given subscription. In this case __isPartialUpdateSupported__ should return true. Otherwise, __isPartialUpdateSupported__ should return false and the implementation should serialize/deserialize __SubscriptionData__ as an opaque blob.
+
+* @createSubscriptionData@: Create a subscription entry for a given topic. The initial __version__ would be returned for a success creation. __SubscriptionStateExistsException__ is returned if the subscription entry already exists.
+
+* @updateSubscriptionData/replaceSubscriptionData@: Update/replace the subscription data in the underlying key/value store with the given __version__. If the current __version__ in underlying key/value store doesn't equal to the provided __version__, the update should be rejected with __BadVersionException__. The new __version__ should be returned for a successful write. __NoSubscriptionStateException__ is returned if no subscription entry is found for a subscription (topic, subscriber).
+
+h3. Read Subscriptions
+
+* @readSubscriptionData@: Read the subscription data from the underlying key/value store. The implementation should take the responsibility of deserializing the metadata into a __SubscriptionData__ object including its preferences and subscription state. Also, its current __version__ needs to be returned for future updates. If there is no subscription data found for a subscription, a null value is returned.
+
+* @readSubscriptions@: Read all the subscription data from key/value store for a given topic. The implementation should take the responsibility of managing all subscription for a topic for efficient access.  An empty map is returned if there are no subscriptions found for a given topic.
+
+h3. Delete Subscription
+
+* @deleteSubscriptionData@: Delete the subscription data from the key/value store with given __version__ for a specific subscription (topic, subscriber). The subscription info should be removed if current __version__ in key/value store equals the provided __version__. Otherwise, the deletion should be rejected with __BadVersionException__. __NoSubscriptionStateException__ is returned if no subscription data is found for a subscription (topic, subscriber).
+
+h1. How to choose a key/value store for Hedwig.
+
+From the interface, several requirements needs to meet before picking up a key/value store for Hedwig:
+
+* @CAS@: The ability to do strict updates according to specific condition, i.e. a specific version (ZooKeeper) and same content (HBase).
+* @Optimized for Writes@: The metadata access pattern for Hedwig is read first and continuous updates.
+* @Optimized for retrieving all subscriptions for a topic@: Either hierarchical structures to maintain such relationships (ZooKeeper), or ordered key/value storage to cluster the subscription for a topic together, would provide efficient subscription data management.
+
+__ZooKeeper__ is the default implementation for Hedwig metadata management, which holds data in memory and provides filesystem-like namespace, meeting the above requirements. __ZooKeeper__ is suitable for most Hedwig usecases. However, if your application needs to manage millions of topics/subscriptions, a more scalable solution would be __HBase__, which also meet the above requirements.

Added: bookkeeper/site/trunk/content/docs/master/hedwigParams.textile
URL: http://svn.apache.org/viewvc/bookkeeper/site/trunk/content/docs/master/hedwigParams.textile?rev=1644097&view=auto
==============================================================================
--- bookkeeper/site/trunk/content/docs/master/hedwigParams.textile (added)
+++ bookkeeper/site/trunk/content/docs/master/hedwigParams.textile Tue Dec  9 16:00:52 2014
@@ -0,0 +1,92 @@
+Notice: Licensed under the Apache License, Version 2.0 (the "License");
+        you may not use this file except in compliance with the License. You may
+        obtain a copy of the License at "http://www.apache.org/licenses/LICENSE-2.0":http://www.apache.org/licenses/LICENSE-2.0.
+        .        
+        Unless required by applicable law or agreed to in writing,
+        software distributed under the License is distributed on an "AS IS"
+        BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+        implied. See the License for the specific language governing permissions
+        and limitations under the License.
+        .
+        
+h1. Hedwig configuration parameters        
+        
+This page contains detailed information about configuration parameters used for Hubs, Regions, ZooKeeper, and BookKeeper.
+        
+h2. Hedwig server configuration parameters
+
+Please also refer to the configuration file that comes with the distribution: _hedwig-server/conf/hw_server.conf_.  
+
+h3. Region related parameters
+
+| @region@ | Region identifier. Default is "standalone". |
+| @regions@ | List of region identifiers, space separated. Default is empty. |
+| @inter_region_ssl_enabled (deprecated)@ | Enables SSL across regions. Default is false. *Since this parameter has been deprecated, use __ssl_enabled__ in _hedwig-server/conf/hw_region_client.conf_ to enable SSL across regions instead.* |
+| @retry_remote_subscribe_thread_run_interval@ | This parameter is used to determine how often we run a thread to retry those failed remote subscriptions in asynchronous mode (in milliseconds). Default is 2 minutes. |
+
+h3. Hub server parameters
+
+| @standalone@ | Sets the hub server to run in standalone mode (no regions). Default is false. |
+| @server_port@ | Sets the server port that receives client connections. Default is 4080. |
+| @ssl_enabled@ | Enables SSL. Default is false. |
+| @ssl_server_port@ | Sets the server port for SSL connections. Default is 9876. | 
+| @password@ | Password used for pkcs12 certificate.. Default is the empty string. |
+| @cert_name@ | Sets the name of the SSL certificate if available as a resource. Default is the null string. |
+| @cert_path@ | Sets the path to the SSL certificate if it is available as a file. Default is the null string. |
+
+h3. Read-ahead cache parameters
+
+| @readahead_enabled@ | Enables read-ahead. Enabled by default. | 
+| @readahead_count@ | Number of messages to read ahead. Default is 10. |
+| @readahead_size@ | Maximum number of bytes to read during a scan. Default is 4 megabytes. |
+
+bq. Upon a range scan request for a given topic, two hints are provided as to when scanning should stop: the number of messages scanned and the total size of messages scanned. Scanning stops whenever one of these limits is exceeded.
+
+| @cache_size@ | Sets the size of the read-ahead cache. Default is the smallest of 2G or half the heap size. | 
+| @cache_entry_ttl@ | Sets TTL for cache entries. Each time adding new entry into the cache, those expired cache entries would be discarded. If the value is set to zero or less than zero, cache entry will not be evicted until the cache is fullfilled or the messages are already consumed. Default is 0. |
+| @scan_backoff_ms@ | The backoff time (in milliseconds) to retry scans after failures. Default value is 1s (1000ms). Default is 1s. |
+| @num_readahead_cache_threads@ | Sets the number of threads to be used for the read-ahead mechanism. Default is the number of cores as returned with a call to <code>Runtime.getRuntime().availableProcessors()</code>.|
+
+h3. Publish and subscription parameters 
+
+| @max_message_size@ | Sets the maximum message size. Default is 1.2 megabytes. |
+| @default_message_window_size@ | This parameter is used for setting the default maximum number of messages that can be delivered to a subscriber without being consumed. We pause delivery to a subscriber when reaching the window size. Default is unlimited (0). |
+| @consume_interval@ | Sets the number of messages consumed before persisting information about consumed messages. A value greater than one avoids persisting information about consumed messages upon every consumed message. Default is 50.|
+| @retention_secs@ | the interval to release a topic. If this parameter is greater than zero, then schedule a task to release an owned topic. Default is 0 (never released).
+| @messages_consumed_thread_run_interval@ | Time interval (in milliseconds) to run messages consumed timer task to
+delete those consumed ledgers in BookKeeper. Default is 1 minute (60,000 ms). |
+
+
+h3. ZooKeeper parameters
+ 
+| @zk_host@ | Sets the ZooKeeper list of servers. Default is localhost:2181. |
+| @zk_timeout@ | Sets the ZooKeeper session timeout. Default is 2s. |
+
+h3. BookKeeper parameters
+
+| @bk_ensemble_size@ | Sets the ensemble size. Default is 3. |
+| @bk_write_quorum_size@ | Sets the write quorum size. Default is 2. |
+| @bk_ack_quorum_size@ | Sets the ack quorum size. Default is 2. |
+
+bq. Note that the ack quorum size must be equal or smaller than the write quorum size.
+
+| @max_entries_per_ledger@ | Maximum number of entries before we roll a ledger. Default is unlimited (0). |
+
+h3. Metadata parameters
+
+| @zk_prefix@ | Sets the ZooKeeper path prefix. Default is _/hedwig_. |
+| @metadata_manager_based_topic_manager_enabled@ | Enables the use of a metadata manager for topic management. Default is false. |
+| @metadata_manager_factory_class@ | Sets the default factory for the metadata manager. Default is null. |
+
+h2. Region manager configuration parameters
+
+Please also refer to the configuration file that comes with the distribution: _hedwig-server/conf/hw_region_client.conf_.
+
+| @ssl_enabled@ | This parameter is a boolean flag indicating if communication with the server should be done via SSL for encryption. The Hedwig server hubs also need to be SSL enabled for this to work. Default value is false. |
+| @max_message_size@ | Sets the maximum message size in bytes. The default value is 2 MB (2097152). |
+| @max_server_redirects@ | Sets the maximum number of redirects we permit before signaling an error. Default value is 2. |
+| @auto_send_consume_message_enabled@ | A flag indicating whether the client library should automatically send consume messages to the server. Default value is true. |
+| @consumed_messages_buffer_size@ | Sets the number of messages we buffer before sending a consume message to the server. Default value is 5. |
+| @max_outstanding_messages@ | Support for client side throttling, sets the maximum number of outstanding messages. Default value is 10. |
+| @server_ack_response_timeout@ | Sets the timeout (in milliseconds) before we error out any existing requests. Default value is 30s (30,000). |
+        

Added: bookkeeper/site/trunk/content/docs/master/hedwigUser.textile
URL: http://svn.apache.org/viewvc/bookkeeper/site/trunk/content/docs/master/hedwigUser.textile?rev=1644097&view=auto
==============================================================================
--- bookkeeper/site/trunk/content/docs/master/hedwigUser.textile (added)
+++ bookkeeper/site/trunk/content/docs/master/hedwigUser.textile Tue Dec  9 16:00:52 2014
@@ -0,0 +1,63 @@
+Notice: Licensed under the Apache License, Version 2.0 (the "License");
+        you may not use this file except in compliance with the License. You may
+        obtain a copy of the License at "http://www.apache.org/licenses/LICENSE-2.0":http://www.apache.org/licenses/LICENSE-2.0.
+        .        
+        Unless required by applicable law or agreed to in writing,
+        software distributed under the License is distributed on an "AS IS"
+        BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+        implied. See the License for the specific language governing permissions
+        and limitations under the License.
+        .
+
+h1. Design
+
+In Hedwig, clients publish messages associated with a topic, and they subscribe to a topic to receive all messages published with that topic. Clients are associated with (publish to and subscribe from) a Hedwig _instance_ (also referred to as a _region_), which consists of a number of servers called _hubs_. The hubs partition up topic ownership among themselves, and all publishes and subscribes to a topic must be done to its owning hub. When a client doesn't know the owning hub, it tries a default hub, which may redirect the client.
+
+Running a Hedwig instance requires a Zookeeper server and at least three Bookkeeper servers.
+
+An instance is designed to run within a datacenter. For wide-area messaging across datacenters, specify in the server configuration the set of default servers for each of the other instances. Dissemination among instances currently takes place over an all-to-all topology. Local subscriptions cause the hub to subscribe to all other regions on this topic, so that the local region receives all updates to it. Future work includes allowing the user to overlay alternative topologies.
+
+Because all messages on a topic go through a single hub per region, all messages within a region are ordered. This means that, for a given topic, messages are delivered in the same order to all subscribers within a region, and messages from any particular region are delivered in the same order to all subscribers globally, but messages from different regions may be delivered in different orders to different regions. Providing global ordering is prohibitively expensive in the wide area. However, in Hedwig clients such as PNUTS, the lack of global ordering is not a problem, as PNUTS serializes all updates to a table row at a single designated master for that row.
+
+Topics are independent; Hedwig provides no ordering across different topics.
+
+Version vectors are associated with each topic and serve as the identifiers for each message. Vectors consist of one component per region. A component value is the region's local sequence number on the topic, and is incremented each time a hub persists a message (published either locally or remotely) to BK.
+
+TODO: More on how version vectors are to be used, and on maintaining vector-maxes.
+
+h1. Entry Points
+
+The main class for running the server is @org.apache.hedwig.server.netty.PubSubServer@. It takes a single argument, which is a "Commons Configuration":http://commons.apache.org/configuration/ file. Currently, for configuration, the source is the documentation. See @org.apache.hedwig.server.conf.ServerConfiguration@ for server configuration parameters.
+
+The client is a library intended to be consumed by user applications. It takes a Commons Configuration object, for which the source/documentation is in @org.apache.hedwig.client.conf.ClientConfiguration@.
+
+h1. Deployment
+
+h2. Limits
+
+Because the current implementation uses a single socket per subscription, the Hedwig requires a high @ulimit@ on the number of open file descriptors. Non-root users can only use up to the limit specified in @/etc/security/limits.conf@; to raise this to 1024^2, as root, modify the &quot;nofile&quot; line in /etc/security/limits.conf on all hubs.
+
+h2. Running Servers
+
+Hedwig requires BookKeeper to run. For BookKeeper setup instructions see "BookKeeper Getting Started":./bookkeeperStarted.html.
+
+To start a Hedwig hub server:
+
+@hedwig-server/bin/hedwig server@
+
+Hedwig takes its configuration from hedwig-server/conf/hw_server.conf by default. To change location of the conf file, modify the HEDWIG_SERVER_CONF environment variable.
+
+h1. Debugging
+
+You can attach an Eclipse debugger (or any debugger) to a Java process running on a remote host, as long as it has been started with the appropriate JVM flags. (See the Building Hedwig document to set up your Eclipse environment.) To launch something using @bin/hedwig@ with debugger attachment enabled, prefix the command with @HEDWIG_EXTRA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,address=5000@, e.g.:
+
+@HEDWIG_EXTRA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,address=5000 hedwig-server/bin/hedwig server@
+
+h1. Logging
+
+Hedwig uses "slf4j":http://www.slf4j.org for logging, with the log4j bindings enabled by default. To enable logging from hedwig, create a log4j.properties file and point the environment variable HEDWIG_LOG_CONF to the file. The path to the log4j.properties file must be absolute.
+
+@export HEDWIG_LOG_CONF=/tmp/log4j.properties@
+@hedwig-server/bin/hedwig server@
+
+

Added: bookkeeper/site/trunk/content/docs/master/index.textile
URL: http://svn.apache.org/viewvc/bookkeeper/site/trunk/content/docs/master/index.textile?rev=1644097&view=auto
==============================================================================
--- bookkeeper/site/trunk/content/docs/master/index.textile (added)
+++ bookkeeper/site/trunk/content/docs/master/index.textile Tue Dec  9 16:00:52 2014
@@ -0,0 +1,52 @@
+Title:     BookKeeper Documentation
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+h1. Apache BookKeeper documentation
+
+* "Overview":./bookkeeperOverview.html
+* "Getting started":./bookkeeperStarted.html
+* "Programmer's Guide":./bookkeeperProgrammer.html
+* "Bookie Server Configuration Parameters":./bookieConfigParams.html
+* "BookKeeper Configuration Parameters":./bookkeeperConfigParams.html
+* "BookKeeper Internals":./bookkeeperInternals.html
+* "Bookie Recovery":./bookieRecovery.html
+* "Using BookKeeper stream library":./bookkeeperStream.html
+* "BookKeeper Metadata Management":./bookkeeperMetadata.html
+
+h2. BookKeeper Admin & Ops
+
+* "Admin Guide":./bookkeeperConfig.html
+* "BookKeeper JMX":./bookkeeperJMX.html
+
+h1. Apache Hedwig documentation
+
+* "Building Hedwig, or how to set up Hedwig":./hedwigBuild.html
+* "User's Guide, or how to program against the Hedwig API and how to run it":./hedwigUser.html
+* "Developer's Guide, or Hedwig internals and hacking details":./hedwigDesign.html
+* "Configuration parameters":./hedwigParams.html
+* "Message Filtering":./hedwigMessageFilter.html
+* "Hedwig Metadata Management":./hedwigMetadata.html
+
+h2. Hedwig Admin & Ops
+
+* "Hedwig Console":./hedwigConsole.html
+* "Hedwig JMX":./hedwigJMX.html
+
+h1. Metastore documentation
+
+* "Metastore Interface":./metastore.textile

Added: bookkeeper/site/trunk/content/docs/master/metastore.textile
URL: http://svn.apache.org/viewvc/bookkeeper/site/trunk/content/docs/master/metastore.textile?rev=1644097&view=auto
==============================================================================
--- bookkeeper/site/trunk/content/docs/master/metastore.textile (added)
+++ bookkeeper/site/trunk/content/docs/master/metastore.textile Tue Dec  9 16:00:52 2014
@@ -0,0 +1,47 @@
+Title:        Metastore Interface
+Notice: Licensed under the Apache License, Version 2.0 (the "License");
+        you may not use this file except in compliance with the License. You may
+        obtain a copy of the License at "http://www.apache.org/licenses/LICENSE-2.0":http://www.apache.org/licenses/LICENSE-2.0.
+        .
+        .
+        Unless required by applicable law or agreed to in writing,
+        software distributed under the License is distributed on an "AS IS"
+        BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+        implied. See the License for the specific language governing permissions
+        and limitations under the License.
+        .
+        .
+
+h1. Metastore Interface
+
+Although Apache BookKeeper provides "LedgerManager":./bookkeeperMetadata.html and "Hedwig Metadata Managers":./hedwigMetadata.html for users to plugin different metadata storages for both BookKeeper and Hedwig, it is quite difficult to implement a correct and efficient manager version based on the knowledge for both projects. The __MetaStore__ interface extracts the commonality of the metadata storage interfaces and is provided for users to focus on adapting the underlying storage itself w/o having to worry about the detailed logic for BookKeeper and Hedwig.
+
+h2. MetaStore
+
+The __MetaStore__ interface provide users with access to __MetastoreTable__s used for BookKeeper and Hedwig metadata management. There are two kinds of table defined in a __MetaStore__, __MetastoreTable__ which provides basic __PUT__,__GET__,__REMOVE__,__SCAN__ operations and which does not assume any ordering requirements from the underlying storage; and __MetastoreScannableTable__ which is derived from __MetastoreTable__, but *does* assume that data is stored in key order in the underlying storage.
+
+* @getName@: Return the name of the __MetaStore__.
+* @getVersion@: Return current __MetaStore__ plugin version.
+* @init@: Initialize the __MetaStore__ library with the given configuration and its version.
+* @close@: Close the __MetaStore__, freeing all resources. i.e. release all the open connections and occupied memory etc.
+* @createTable@: Create a table instance to access the data stored in it. A table name is given to locate the table. An __MetastoreTable__ object is returned.
+* @createScannableTable@: Similar as __createTable__, but returns __MetastoreScannableTable__ rather then __MetastoreTable__ object. If the underlying table is not an ordered table, __MetastoreException__ should be thrown.
+
+h2. MetaStore Table
+
+__MetastoreTable__ is a basic unit in a __MetaStore__, which is used to handle different types of metadata, i.e. A __MetastoreTable__ is used to store metadata for ledgers, while the other __MetastoreTable__ is used to store metadata for topic persistence info. The interface for a __MetastoreTable__ is quite simple:
+
+* @get@: Retrieve a entry by a given __key__. __OK__ and its current version in metadata storage is returned when succeed. __NoKey__ returned for a non-existent key. If __fields__ are specified, return only the specified fields for the key.
+* @put@: Put the given __value__ associated with __key__ with given __version__. The value is only updated when the given __version__ equals the current version in metadata storage. A new __version__ should be returned when updated successfully. __NoKey__ is returned for a non-existent key, __BadVersion__ is returned when an update is attempted with a __version__ which does not match the one in the metadata store.
+* @remove@: Remove the given __value__ associated with __key__. The value is only removed when the given __version__ equals its current version in metadata storage. __NoKey__ is returned for a non-existent key, __BadVersion__ is returned when remove is attempted with a __version__ which does not match.
+* @openCursor@: Open a __cursor__ to iterate over all the entries of a table. The returned cursor doesn't need to guarantee any order and transaction.
+
+h2. MetaStore Scannable Table
+
+__MetastoreScannableTable__ is identical to a __MetastoreTable__ except that it provides an addition interface to iterate over entries in the table in key order.
+
+* @openCursor@: Open a __cursor__ to iterate over all the entries of a table between the key range of __firstKey__ and __lastKey__.
+
+h2. How to organize your metadata.
+
+Some metadata in Hedwig and BookKeeper does not need to be stored in the order of the ledger id or the topic. You could use kind of hash table to store metadata for them. These metadata are topic ownership and topic persistence info. Besides that, subscription state and ledger metadata must be stored in key order due to the current logic in Hedwig/BookKeeper.



Mime
View raw message