distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From si...@apache.org
Subject [01/12] incubator-distributedlog git commit: Release 0.4.0-incubating
Date Wed, 26 Apr 2017 18:41:25 GMT
Repository: incubator-distributedlog
Updated Branches:
  refs/heads/master 945c14a99 -> 3469fc878


http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/user_guide/implementation/storage.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/user_guide/implementation/storage.rst b/website/docs/0.4.0-incubating/user_guide/implementation/storage.rst
new file mode 100644
index 0000000..1fa3de0
--- /dev/null
+++ b/website/docs/0.4.0-incubating/user_guide/implementation/storage.rst
@@ -0,0 +1,326 @@
+---
+layout: default
+
+# Sub-level navigation
+sub-nav-group: user-guide
+sub-nav-parent: implementation
+sub-nav-pos: 1
+sub-nav-title: Storage
+
+---
+
+.. contents:: Storage
+
+Storage
+=======
+
+This describes some implementation details of storage layer.
+
+Ensemble Placement Policy
+-------------------------
+
+`EnsemblePlacementPolicy` encapsulates the algorithm that bookkeeper client uses to select
a number of bookies from the
+cluster as an ensemble for storing data. The algorithm is typically based on the data input
as well as the network
+topology properties.
+
+By default, BookKeeper offers a `RackawareEnsemblePlacementPolicy` for placing the data across
racks within a
+datacenter, and a `RegionAwareEnsemblePlacementPolicy` for placing the data across multiple
datacenters.
+
+How does EnsemblePlacementPolicy work?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The interface of `EnsemblePlacementPolicy` is described as below.
+
+::
+
+    public interface EnsemblePlacementPolicy {
+
+        /**
+         * Initialize the policy.
+         *
+         * @param conf client configuration
+         * @param optionalDnsResolver dns resolver
+         * @param hashedWheelTimer timer
+         * @param featureProvider feature provider
+         * @param statsLogger stats logger
+         * @param alertStatsLogger stats logger for alerts
+         */
+        public EnsemblePlacementPolicy initialize(ClientConfiguration conf,
+                                                  Optional<DNSToSwitchMapping> optionalDnsResolver,
+                                                  HashedWheelTimer hashedWheelTimer,
+                                                  FeatureProvider featureProvider,
+                                                  StatsLogger statsLogger,
+                                                  AlertStatsLogger alertStatsLogger);
+
+        /**
+         * Uninitialize the policy
+         */
+        public void uninitalize();
+
+        /**
+         * A consistent view of the cluster (what bookies are available as writable, what
bookies are available as
+         * readonly) is updated when any changes happen in the cluster.
+         *
+         * @param writableBookies
+         *          All the bookies in the cluster available for write/read.
+         * @param readOnlyBookies
+         *          All the bookies in the cluster available for readonly.
+         * @return the dead bookies during this cluster change.
+         */
+        public Set<BookieSocketAddress> onClusterChanged(Set<BookieSocketAddress>
writableBookies,
+                                                         Set<BookieSocketAddress> readOnlyBookies);
+
+        /**
+         * Choose <i>numBookies</i> bookies for ensemble. If the count is more
than the number of available
+         * nodes, {@link BKNotEnoughBookiesException} is thrown.
+         *
+         * @param ensembleSize
+         *          Ensemble Size
+         * @param writeQuorumSize
+         *          Write Quorum Size
+         * @param excludeBookies
+         *          Bookies that should not be considered as targets.
+         * @return list of bookies chosen as targets.
+         * @throws BKNotEnoughBookiesException if not enough bookies available.
+         */
+        public ArrayList<BookieSocketAddress> newEnsemble(int ensembleSize, int writeQuorumSize,
int ackQuorumSize,
+                                                          Set<BookieSocketAddress>
excludeBookies) throws BKNotEnoughBookiesException;
+
+        /**
+         * Choose a new bookie to replace <i>bookieToReplace</i>. If no bookie
available in the cluster,
+         * {@link BKNotEnoughBookiesException} is thrown.
+         *
+         * @param bookieToReplace
+         *          bookie to replace
+         * @param excludeBookies
+         *          bookies that should not be considered as candidate.
+         * @return the bookie chosen as target.
+         * @throws BKNotEnoughBookiesException
+         */
+        public BookieSocketAddress replaceBookie(int ensembleSize, int writeQuorumSize, int
ackQuorumSize,
+                                                 Collection<BookieSocketAddress> currentEnsemble,
BookieSocketAddress bookieToReplace,
+                                                 Set<BookieSocketAddress> excludeBookies)
throws BKNotEnoughBookiesException;
+
+        /**
+         * Reorder the read sequence of a given write quorum <i>writeSet</i>.
+         *
+         * @param ensemble
+         *          Ensemble to read entries.
+         * @param writeSet
+         *          Write quorum to read entries.
+         * @param bookieFailureHistory
+         *          Observed failures on the bookies
+         * @return read sequence of bookies
+         */
+        public List<Integer> reorderReadSequence(ArrayList<BookieSocketAddress>
ensemble,
+                                                 List<Integer> writeSet, Map<BookieSocketAddress,
Long> bookieFailureHistory);
+
+
+        /**
+         * Reorder the read last add confirmed sequence of a given write quorum <i>writeSet</i>.
+         *
+         * @param ensemble
+         *          Ensemble to read entries.
+         * @param writeSet
+         *          Write quorum to read entries.
+         * @param bookieFailureHistory
+         *          Observed failures on the bookies
+         * @return read sequence of bookies
+         */
+        public List<Integer> reorderReadLACSequence(ArrayList<BookieSocketAddress>
ensemble,
+                                                List<Integer> writeSet, Map<BookieSocketAddress,
Long> bookieFailureHistory);
+    }
+
+The methods in this interface covers three parts - 1) initialization and uninitialization;
2) how to choose bookies to
+place data; and 3) how to choose bookies to do speculative reads.
+
+Initialization and uninitialization
+___________________________________
+
+The ensemble placement policy is constructed by jvm reflection during constructing bookkeeper
client. After the
+`EnsemblePlacementPolicy` is constructed, bookkeeper client will call `#initialize` to initialize
the placement policy.
+
+The `#initialize` method takes a few resources from bookkeeper for instantiating itself.
These resources include:
+
+1. `ClientConfiguration` : The client configuration that used for constructing the bookkeeper
client. The implementation of the placement policy could obtain its settings from this configuration.
+2. `DNSToSwitchMapping`: The DNS resolver for the ensemble policy to build the network topology
of the bookies cluster. It is optional.
+3. `HashedWheelTimer`: A hashed wheel timer that could be used for timing related work. For
example, a stabilize network topology could use it to delay network topology changes to reduce
impacts of flapping bookie registrations due to zk session expires.
+4. `FeatureProvider`: A feature provider that the policy could use for enabling or disabling
its offered features. For example, a region-aware placement policy could offer features to
disable placing data to a specific region at runtime.
+5. `StatsLogger`: A stats logger for exposing stats.
+6. `AlertStatsLogger`: An alert stats logger for exposing critical stats that needs to be
alerted.
+
+The ensemble placement policy is a single instance per bookkeeper client. The instance will
be `#uninitialize` when
+closing the bookkeeper client. The implementation of a placement policy should be responsible
for releasing all the
+resources that allocated during `#initialize`.
+
+How to choose bookies to place
+______________________________
+
+The bookkeeper client discovers list of bookies from zookeeper via `BookieWatcher` - whenever
there are bookie changes,
+the ensemble placement policy will be notified with new list of bookies via `onClusterChanged(writableBookie,
readOnlyBookies)`.
+The implementation of the ensemble placement policy will react on those changes to build
new network topology. Subsequent
+operations like `newEnsemble` or `replaceBookie` hence can operate on the new network topology.
+
+newEnsemble(ensembleSize, writeQuorumSize, ackQuorumSize, excludeBookies)
+    Choose `ensembleSize` bookies for ensemble. If the count is more than the number of available
nodes,
+    `BKNotEnoughBookiesException` is thrown.
+
+replaceBookie(ensembleSize, writeQuorumSize, ackQuorumSize, currentEnsemble, bookieToReplace,
excludeBookies)
+    Choose a new bookie to replace `bookieToReplace`. If no bookie available in the cluster,
+    `BKNotEnoughBookiesException` is thrown.
+
+
+Both `RackAware` and `RegionAware` placement policies are `TopologyAware` policies. They
build a `NetworkTopology` on
+responding bookie changes, use it for ensemble placement and ensure rack/region coverage
for write quorums - a write
+quorum should be covered by at least two racks or regions.
+
+Network Topology
+^^^^^^^^^^^^^^^^
+
+The network topology is presenting a cluster of bookies in a tree hierarchical structure.
For example, a bookie cluster
+may be consists of many data centers (aka regions) filled with racks of machines. In this
tree structure, leaves
+represent bookies and inner nodes represent switches/routes that manage traffic in/out of
regions or racks.
+
+For example, there are 3 bookies in region `A`. They are `bk1`, `bk2` and `bk3`. And their
network locations are
+`/region-a/rack-1/bk1`, `/region-a/rack-1/bk2` and `/region-a/rack-2/bk3`. So the network
topology will look like below:
+
+::
+
+              root
+               |
+           region-a
+             /  \
+        rack-1  rack-2
+         /  \       \
+       bk1  bk2     bk3
+
+Another example, there are 4 bookies spanning in two regions `A` and `B`. They are `bk1`,
`bk2`, `bk3` and `bk4`. And
+their network locations are `/region-a/rack-1/bk1`, `/region-a/rack-1/bk2`, `/region-b/rack-2/bk3`
and `/region-b/rack-2/bk4`.
+The network topology will look like below:
+
+::
+
+                    root
+                    /  \
+             region-a  region-b
+                |         |
+              rack-1    rack-2
+               / \       / \
+             bk1  bk2  bk3  bk4
+
+The network location of each bookie is resolved by a `DNSResolver` (interface is described
as below). The `DNSResolver`
+resolves a list of DNS-names or IP-addresses into a list of network locations. The network
location that is returned
+must be a network path of the form `/region/rack`, where `/` is the root, and `region` is
the region id representing
+the data center where `rack` is located. The network topology of the bookie cluster would
determine the number of
+components in the network path.
+
+::
+
+    /**
+     * An interface that must be implemented to allow pluggable
+     * DNS-name/IP-address to RackID resolvers.
+     *
+     */
+    @Beta
+    public interface DNSToSwitchMapping {
+        /**
+         * Resolves a list of DNS-names/IP-addresses and returns back a list of
+         * switch information (network paths). One-to-one correspondence must be
+         * maintained between the elements in the lists.
+         * Consider an element in the argument list - x.y.com. The switch information
+         * that is returned must be a network path of the form /foo/rack,
+         * where / is the root, and 'foo' is the switch where 'rack' is connected.
+         * Note the hostname/ip-address is not part of the returned path.
+         * The network topology of the cluster would determine the number of
+         * components in the network path.
+         * <p/>
+         *
+         * If a name cannot be resolved to a rack, the implementation
+         * should return {@link NetworkTopology#DEFAULT_RACK}. This
+         * is what the bundled implementations do, though it is not a formal requirement
+         *
+         * @param names the list of hosts to resolve (can be empty)
+         * @return list of resolved network paths.
+         * If <i>names</i> is empty, the returned list is also empty
+         */
+        public List<String> resolve(List<String> names);
+
+        /**
+         * Reload all of the cached mappings.
+         *
+         * If there is a cache, this method will clear it, so that future accesses
+         * will get a chance to see the new data.
+         */
+        public void reloadCachedMappings();
+    }
+
+By default, the network topology responds to bookie changes immediately. That means if a
bookie's znode appears in  or
+disappears from zookeeper, the network topology will add the bookie or remove the bookie
immediately. It introduces
+instability when bookie's zookeeper registration becomes flapping. In order to address this,
there is a `StabilizeNetworkTopology`
+which delays removing bookies from network topology if they disappear from zookeeper. It
could be enabled by setting
+the following option.
+
+::
+
+    # enable stabilize network topology by setting it to a positive value.
+    bkc.networkTopologyStabilizePeriodSeconds=10
+
+
+RackAware and RegionAware
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+`RackAware` placement policy basically just chooses bookies from different racks in the built
network topology. It
+guarantees that a write quorum will cover at least two racks.
+
+`RegionAware` placement policy is a hierarchical placement policy, which it chooses equal-sized
bookies from regions, and
+within each region it uses `RackAware` placement policy to choose bookies from racks. For
example, if there is 3 regions -
+`region-a`, `region-b` and `region-c`, an application want to allocate a 15-bookies ensemble.
First, it would figure
+out there are 3 regions and it should allocate 5 bookies from each region. Second, for each
region, it would use
+`RackAware` placement policy to choose 5 bookies.
+
+How to choose bookies to do speculative reads?
+______________________________________________
+
+`reorderReadSequence` and `reorderReadLACSequence` are two methods exposed by the placement
policy, to help client
+determine a better read sequence according to the network topology and the bookie failure
history.
+
+In `RackAware` placement policy, the reads will be tried in following sequence:
+
+- bookies are writable and didn't experience failures before
+- bookies are writable and experienced failures before
+- bookies are readonly
+- bookies already disappeared from network topology
+
+In `RegionAware` placement policy, the reads will be tried in similar following sequence
as `RackAware` placement policy.
+There is a slight different on trying writable bookies: after trying every 2 bookies from
local region, it would try
+a bookie from remote region. Hence it would achieve low latency even there is network issues
within local region.
+
+How to enable different EnsemblePlacementPolicy?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Users could configure using different ensemble placement policies by setting following options
in distributedlog
+configuration files.
+
+::
+
+    # enable rack-aware ensemble placement policy
+    bkc.ensemblePlacementPolicy=org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy
+    # enable region-aware ensemble placement policy
+    bkc.ensemblePlacementPolicy=org.apache.bookkeeper.client.RegionAwareEnsemblePlacementPolicy
+
+The network topology of bookies built by either `RackawareEnsemblePlacementPolicy` or `RegionAwareEnsemblePlacementPolicy`
+is done via a `DNSResolver`. The default `DNSResolver` is a script based DNS resolver. It
reads the configuration
+parameters, executes any defined script, handles errors and resolves domain names to network
locations. The script
+is configured via following settings in distributedlog configuration.
+
+::
+
+    bkc.networkTopologyScriptFileName=/path/to/dns/resolver/script
+
+Alternatively, the `DNSResolver` could be configured in following settings and loaded via
reflection. `DNSResolverForRacks`
+is a good example to check out for customizing your dns resolver based our network environments.
+
+::
+
+    bkEnsemblePlacementDnsResolverClass=org.apache.distributedlog.net.DNSResolverForRacks
+

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/user_guide/implementation/writeproxy.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/user_guide/implementation/writeproxy.rst b/website/docs/0.4.0-incubating/user_guide/implementation/writeproxy.rst
new file mode 100644
index 0000000..bc9feca
--- /dev/null
+++ b/website/docs/0.4.0-incubating/user_guide/implementation/writeproxy.rst
@@ -0,0 +1,5 @@
+---
+layout: default
+---
+
+.. contents:: Write Proxy

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/user_guide/main.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/user_guide/main.rst b/website/docs/0.4.0-incubating/user_guide/main.rst
new file mode 100644
index 0000000..37eacbc
--- /dev/null
+++ b/website/docs/0.4.0-incubating/user_guide/main.rst
@@ -0,0 +1,13 @@
+---
+title: "User Guide"
+layout: guide
+
+# Sub-level navigation
+sub-nav-group: user-guide
+sub-nav-id: user-guide
+sub-nav-parent: _root_
+sub-nav-group-title: User Guide
+sub-nav-pos: 1
+sub-nav-title: User Guide
+
+---

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/user_guide/references/features.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/user_guide/references/features.rst b/website/docs/0.4.0-incubating/user_guide/references/features.rst
new file mode 100644
index 0000000..c92ae3f
--- /dev/null
+++ b/website/docs/0.4.0-incubating/user_guide/references/features.rst
@@ -0,0 +1,42 @@
+---
+layout: default
+
+# Sub-level navigation
+sub-nav-group: user-guide
+sub-nav-parent: references
+sub-nav-pos: 3
+sub-nav-title: Available Features
+---
+
+.. contents:: Available Features
+
+Features
+========
+
+BookKeeper Features
+-------------------
+
+*<scope>* is the scope value of the FeatureProvider passed to BookKeeperClient builder.
in DistributedLog write proxy, the *<scope>* is 'bkc'.
+
+- *<scope>.repp_disable_durability_enforcement*: Feature to disable durability enforcement
on region aware data placement policy. It is a feature that applied for global replicated
log only. If the availability value is larger than zero, the region aware data placement policy
will *NOT* enfore region-wise durability. That says if a *Log* is writing to region A, B,
C with write quorum size *15* and ack quorum size *9*. If the availability value of this feature
is zero, it requires *9*
+  acknowledges from bookies from at least two regions. If the availability value of this
feature is larger than zero, the enforcement is *disabled* and it could acknowledge after
receiving *9* acknowledges from whatever regions. By default the availability is zero. Turning
on this value to tolerant multiple region failures.
+
+- *<scope>.disable_ensemble_change*: Feature to disable ensemble change on DistributedLog
writers. If the availability value of this feature is larger than zero, it would disable ensemble
change on writers. It could be used for toleranting zookeeper outage.
+
+- *<scope>.<region>.disallow_bookie_placement*: Feature to disallow choosing
a bookie replacement from a given *region* when ensemble changing. It is a feature that applied
for global replicated log. If the availability value is larger than zero, the writer (write
proxy) will stop choosing a bookie from *<region>* when ensemble changing. It is useful
to blackout a region dynamically.
+
+DistributedLog Features
+-----------------------
+
+*<scope>* is the scope value of the FeatureProvider passed to DistributedLogNamespace
builder. in DistributedLog write proxy, the *<scope>* is 'dl'.
+
+- *<scope>.disable_logsegment_rolling*: Feature to disable log segment rolling. If
the availability value is larger than zero, the writer (write proxy) will stop rolling to
new log segments and keep writing to current log segments. It is a useful feature to tolerant
zookeeper outage.
+
+- *<scope>.disable_write_limit*: Feature to disable write limiting. If the availability
value is larger than zero, the writer (write proxy) will disable write limiting. It is used
to control write limiting dynamically.
+
+Write Proxy Features
+--------------------
+
+- *region_stop_accept_new_stream*: Feature to disable accepting new streams in current region.
It is a feature that applied for global replicated log only. If the availability value is
larger than zero, the write proxies will stop accepting new streams and throw RegionAvailable
exception to client. Hence client will know this region is stopping accepting new streams.
Client will be forced to send requests to other regions. It is a feature used for ownership
failover between regions.
+- *service_rate_limit_disabled*: Feature to disable service rate limiting. If the availability
value is larger than zero, the write proxies will disable rate limiting.
+- *service_checksum_disabled*: Feature to disable service request checksum validation. If
the availability value is larger than zero, the write proxies will disable request checksum
validation.

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/user_guide/references/main.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/user_guide/references/main.rst b/website/docs/0.4.0-incubating/user_guide/references/main.rst
new file mode 100644
index 0000000..b0e62eb
--- /dev/null
+++ b/website/docs/0.4.0-incubating/user_guide/references/main.rst
@@ -0,0 +1,28 @@
+---
+layout: default
+
+# Top navigation
+top-nav-group: user-guide
+top-nav-pos: 9
+top-nav-title: References
+
+# Sub-level navigation
+sub-nav-group: user-guide
+sub-nav-parent: user-guide
+sub-nav-id: references
+sub-nav-pos: 9
+sub-nav-title: References
+---
+
+References
+===========
+
+This page keeps references on configuration settings, metrics and features that exposed in
DistributedLog.
+
+- `Metrics`_
+
+.. _Metrics: ./metrics
+
+- `Available Features`_
+
+.. _Available Features: ./features

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/user_guide/references/metrics.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/user_guide/references/metrics.rst b/website/docs/0.4.0-incubating/user_guide/references/metrics.rst
new file mode 100644
index 0000000..9f3c9a3
--- /dev/null
+++ b/website/docs/0.4.0-incubating/user_guide/references/metrics.rst
@@ -0,0 +1,492 @@
+---
+layout: default
+
+# Sub-level navigation
+sub-nav-group: user-guide
+sub-nav-parent: references
+sub-nav-pos: 2
+sub-nav-title: Metrics
+
+---
+
+.. contents:: Metrics
+
+Metrics
+=======
+
+This section lists the metrics exposed by main classes.
+
+({scope} is referencing current scope value of passed in StatsLogger.)
+
+MonitoredFuturePool
+-------------------
+
+**{scope}/tasks_pending**
+
+Gauge. How many tasks are pending in this future pool? If this value becomes high, it means
that
+the future pool execution rate couldn't keep up with submission rate. That would be cause
high
+*task_pending_time* hence affecting the callers that use this future pool.
+It could also cause heavy jvm gc if this pool keeps building up.
+
+**{scope}/task_pending_time**
+
+OpStats. It measures the characteristics about the time that tasks spent on waiting being
executed.
+It becomes high because either *tasks_pending* is building up or *task_execution_time* is
high blocking other
+tasks to execute.
+
+**{scope}/task_execution_time**
+
+OpStats. It measures the characteristics about the time that tasks spent on execution. If
it becomes high,
+it would block other tasks to execute if there isn't enough threads in this executor, hence
cause high
+*task_pending_time* and impact user end latency.
+
+**{scope}/task_enqueue_time**
+
+OpStats. The time that tasks spent on submission. The submission time would also impact user
end latency.
+
+MonitoredScheduledThreadPoolExecutor
+------------------------------------
+
+**{scope}/pending_tasks**
+
+Gauge. How many tasks are pending in this thread pool executor? If this value becomes high,
it means that
+the thread pool executor execution rate couldn't keep up with submission rate. That would
be cause high
+*task_pending_time* hence affecting the callers that use this executor. It could also cause
heavy jvm gc if
+queue keeps building up.
+
+**{scope}/completed_tasks**
+
+Gauge. How many tasks are completed in this thread pool executor?
+
+**{scope}/total_tasks**
+
+Gauge. How many tasks are submitted to this thread pool executor?
+
+**{scope}/task_pending_time**
+
+OpStats. It measures the characteristics about the time that tasks spent on waiting being
executed.
+It becomes high because either *pending_tasks* is building up or *task_execution_time* is
high blocking other
+tasks to execute.
+
+**{scope}/task_execution_time**
+
+OpStats. It measures the characteristics about the time that tasks spent on execution. If
it becomes high,
+it would block other tasks to execute if there isn't enough threads in this executor, hence
cause high
+*task_pending_time* and impact user end latency.
+
+OrderedScheduler
+----------------
+
+OrderedScheduler is a thread pool based *ScheduledExecutorService*. It is comprised with
multiple
+MonitoredScheduledThreadPoolExecutor_. Each MonitoredScheduledThreadPoolExecutor_ is wrapped
into a
+MonitoredFuturePool_. So there are aggregated stats and per-executor stats exposed.
+
+Aggregated Stats
+~~~~~~~~~~~~~~~~
+
+**{scope}/task_pending_time**
+
+OpStats. It measures the characteristics about the time that tasks spent on waiting being
executed.
+It becomes high because either *pending_tasks* is building up or *task_execution_time* is
high blocking other
+tasks to execute.
+
+**{scope}/task_execution_time**
+
+OpStats. It measures the characteristics about the time that tasks spent on execution. If
it becomes high,
+it would block other tasks to execute if there isn't enough threads in this executor, hence
cause high
+*task_pending_time* and impact user end latency.
+
+**{scope}/futurepool/tasks_pending**
+
+Gauge. How many tasks are pending in this future pool? If this value becomes high, it means
that
+the future pool execution rate couldn't keep up with submission rate. That would be cause
high
+*task_pending_time* hence affecting the callers that use this future pool.
+It could also cause heavy jvm gc if this pool keeps building up.
+
+**{scope}/futurepool/task_pending_time**
+
+OpStats. It measures the characteristics about the time that tasks spent on waiting being
executed.
+It becomes high because either *tasks_pending* is building up or *task_execution_time* is
high blocking other
+tasks to execute.
+
+**{scope}/futurepool/task_execution_time**
+
+OpStats. It measures the characteristics about the time that tasks spent on execution. If
it becomes high,
+it would block other tasks to execute if there isn't enough threads in this executor, hence
cause high
+*task_pending_time* and impact user end latency.
+
+**{scope}/futurepool/task_enqueue_time**
+
+OpStats. The time that tasks spent on submission. The submission time would also impact user
end latency.
+
+Per Executor Stats
+~~~~~~~~~~~~~~~~~~
+
+Stats about individual executors are exposed under *{scope}/{name}-executor-{id}-0*. *{name}*
is the scheduler
+name and *{id}* is the index of the executor in the pool. The corresponding stats of its
futurepool are exposed
+under *{scope}/{name}-executor-{id}-0/futurepool*. See MonitoredScheduledThreadPoolExecutor_
and MonitoredFuturePool_
+for more details.
+
+ZooKeeperClient
+---------------
+
+Operation Stats
+~~~~~~~~~~~~~~~
+
+All operation stats are exposed under {scope}/zk. The stats are **latency** *OpStats*
+on zookeeper operations.
+
+**{scope}/zk/{op}**
+
+latency stats on operations.
+these operations are *create_client*, *get_data*, *set_data*, *delete*, *get_children*, *multi*,
*get_acl*, *set_acl* and *sync*.
+
+Watched Event Stats
+~~~~~~~~~~~~~~~~~~~
+
+All stats on zookeeper watched events are exposed under {scope}/watcher. The stats are *Counter*
about the watched events that this client received:
+
+**{scope}/watcher/state/{keeper_state}**
+
+the number of `KeeperState` changes that this client received. The states are *Disconnected*,
*SyncConnected*,
+*AuthFailed*, *ConnectedReadOnly*, *SaslAuthenticated* and *Expired*. By monitoring metrics
like *SyncConnected*
+or *Expired* it would help understanding the healthy of this zookeeper client.
+
+**{scope}/watcher/events/{event}**
+
+the number of `Watcher.Event` received by this client. Those events are *None*, *NodeCreated*,
*NodeDeleted*,
+*NodeDataChanged*, *NodeChildrenChanged*.
+
+Watcher Manager Stats
+~~~~~~~~~~~~~~~~~~~~~
+
+This ZooKeeperClient provides a watcher manager to manage watchers for applications. It tracks
the mapping between
+paths and watcher. It is the way to provide the ability on removing watchers. The stats are
*Gauge* about the number
+of watchers managed by this zookeeper client.
+
+**{scope}/watcher_manager/total_watches**
+
+total number of watches that are managed by this watcher manager. If it keeps growing, it
usually means that
+watchers are leaking (resources aren't closed properly). It will cause OOM.
+
+**{scope}/watcher_manager/num_child_watches**
+
+total number of paths that are watched by this watcher manager.
+
+BookKeeperClient
+----------------
+
+TODO: add bookkeeper stats there
+
+DistributedReentrantLock
+------------------------
+
+All stats related to locks are exposed under {scope}/lock.
+
+**{scope}/acquire**
+
+OpStats. It measures the characteristics about the time that spent on acquiring locks.
+
+**{scope}/release**
+
+OpStats. It measures the characteristics about the time that spent on releasing locks.
+
+**{scope}/reacquire**
+
+OpStats. The lock will be expired when the underneath zookeeper session expired. The
+reentrant lock will attempt to re-acquire the lock automatically when session expired.
+This metric measures the characteristics about the time that spent on re-acquiring locks.
+
+**{scope}/internalTryRetries**
+
+Counter. The number of retries that locks spend on re-creating internal locks. Typically,
+a new internal lock will be created when session expired.
+
+**{scope}/acquireTimeouts**
+
+Counter. The number of timeouts that caller experienced when acquiring locks.
+
+**{scope}/tryAcquire**
+
+OpStats. It measures the characteristics about the time that each internal lock spent on
+acquiring.
+
+**{scope}/tryTimeouts**
+
+Counter. The number of timeouts that internal locks try acquiring.
+
+**{scope}/unlock**
+
+OpStats. It measures the characteristics about the time that the caller spent on unlocking
+internal locks.
+
+BKLogHandler
+------------
+
+The log handler is a base class on managing log segments. so all the metrics in this class
are
+related log segments retrieval and exposed under {scope}/logsegments. They are all `OpStats`
in
+the format of `{scope}/logsegments/{op}`. Those operations are:
+
+* force_get_list: force to get the list of log segments.
+* get_list: get the list of the log segments. it might just retrieve from local log segment
cache.
+* get_filtered_list: get the filtered list of log segments.
+* get_full_list: get the full list of log segments.
+* get_inprogress_segment: time between the inprogress log segment created and the handler
read it.
+* get_completed_segment: time between a log segment is turned to completed and the handler
read it.
+* negative_get_inprogress_segment: record the negative values for `get_inprogress_segment`.
+* negative_get_completed_segment: record the negative values for `get_completed_segment`.
+* recover_last_entry: recovering last entry from a log segment.
+* recover_scanned_entries: the number of entries that are scanned during recovering.
+
+See BKLogWriteHandler_ for write handlers.
+
+See BKLogReadHandler_ for read handlers.
+
+BKLogReadHandler
+----------------
+
+The core logic in log reader handle is readahead worker. Most of readahead stats are exposed
under
+{scope}/readahead_worker.
+
+**{scope}/readahead_worker/wait**
+
+Counter. Number of waits that readahead worker is waiting. If this keeps increasing, it usually
means
+readahead keep getting full because of reader slows down reading.
+
+**{scope}/readahead_worker/repositions**
+
+Counter. Number of repositions that readhead worker encounters. Reposition means that a readahead
worker
+finds that it isn't advancing to a new log segment and force re-positioning.
+
+**{scope}/readahead_worker/entry_piggy_back_hits**
+
+Counter. It increases when the last add confirmed being advanced because of the piggy-back
lac.
+
+**{scope}/readahead_worker/entry_piggy_back_misses**
+
+Counter. It increases when the last add confirmed isn't advanced by a read entry because
it doesn't
+iggy back a newer lac.
+
+**{scope}/readahead_worker/read_entries**
+
+OpStats. Stats on number of entries read per readahead read batch.
+
+**{scope}/readahead_worker/read_lac_counter**
+
+Counter. Stats on the number of readLastConfirmed operations
+
+**{scope}/readahead_worker/read_lac_and_entry_counter**
+
+Counter. Stats on the number of readLastConfirmedAndEntry operations.
+
+**{scope}/readahead_worker/cache_full**
+
+Counter. It increases each time readahead worker finds cache become full. If it keeps increasing,
+that means reader slows down reading.
+
+**{scope}/readahead_worker/resume**
+
+OpStats. Stats on readahead worker resuming reading from wait state.
+
+**{scope}/readahead_worker/long_poll_interruption**
+
+OpStats. Stats on the number of interruptions happened to long poll. the interruptions are
usually
+because of receiving zookeeper notifications.
+
+**{scope}/readahead_worker/notification_execution**
+
+OpStats. Stats on executions over the notifications received from zookeeper.
+
+**{scope}/readahead_worker/metadata_reinitialization**
+
+OpStats. Stats on metadata reinitialization after receiving notifcation from log segments
updates.
+
+**{scope}/readahead_worker/idle_reader_warn**
+
+Counter. It increases each time the readahead worker detects itself becoming idle.
+
+BKLogWriteHandler
+-----------------
+
+Log write handlers are responsible for log segment creation/deletions. All the metrics are
exposed under
+{scope}/segments.
+
+**{scope}/segments/open**
+
+OpStats. Latency characteristics on starting a new log segment.
+
+**{scope}/segments/close**
+
+OpStats. Latency characteristics on completing an inprogress log segment.
+
+**{scope}/segments/recover**
+
+OpStats. Latency characteristics on recovering a log segment.
+
+**{scope}/segments/delete**
+
+OpStats. Latency characteristics on deleting a log segment.
+
+BKAsyncLogWriter
+----------------
+
+**{scope}/log_writer/write**
+
+OpStats. latency characteristics about the time that write operations spent.
+
+**{scope}/log_writer/write/queued**
+
+OpStats. latency characteristics about the time that write operations spent in the queue.
+`{scope}/log_writer/write` latency is high might because the write operations are pending
+in the queue for long time due to log segment rolling.
+
+**{scope}/log_writer/bulk_write**
+
+OpStats. latency characteristics about the time that bulk_write operations spent.
+
+**{scope}/log_writer/bulk_write/queued**
+
+OpStats. latency characteristics about the time that bulk_write operations spent in the queue.
+`{scope}/log_writer/bulk_write` latency is high might because the write operations are pending
+in the queue for long time due to log segment rolling.
+
+**{scope}/log_writer/get_writer**
+
+OpStats. the time spent on getting the writer. it could spike when there is log segment rolling
+happened during getting the writer. it is a good stat to look into when the latency is caused
by
+queuing time.
+
+**{scope}/log_writer/pending_request_dispatch**
+
+Counter. the number of queued operations that are dispatched after log segment is rolled.
it is
+an metric on measuring how many operations has been queued because of log segment rolling.
+
+BKAsyncLogReader
+----------------
+
+**{scope}/async_reader/future_set**
+
+OpStats. Time spent on satisfying futures of read requests. if it is high, it means that
the caller
+takes time on processing the result of read requests. The side effect is blocking consequent
reads.
+
+**{scope}/async_reader/schedule**
+
+OpStats. Time spent on scheduling next reads.
+
+**{scope}/async_reader/background_read**
+
+OpStats. Time spent on background reads.
+
+**{scope}/async_reader/read_next_exec**
+
+OpStats. Time spent on executing `reader#readNext()`
+
+**{scope}/async_reader/time_between_read_next**
+
+OpStats. Time spent on between two consequent `reader#readNext()`. if it is high, it means
that
+the caller is slowing down on calling `reader#readNext()`.
+
+**{scope}/async_reader/delay_until_promise_satisfied**
+
+OpStats. Total latency for the read requests.
+
+**{scope}/async_reader/idle_reader_error**
+
+Counter. The number idle reader errors.
+
+BKDistributedLogManager
+-----------------------
+
+Future Pools
+~~~~~~~~~~~~
+
+The stats about future pools that used by writers are exposed under {scope}/writer_future_pool,
+while the stats about future pools that used by readers are exposed under {scope}/reader_future_pool.
+See MonitoredFuturePool_ for detail stats.
+
+Distributed Locks
+~~~~~~~~~~~~~~~~~
+
+The stats about the locks used by writers are exposed under {scope}/lock while those used
by readers
+are exposed under {scope}/read_lock/lock. See DistributedReentrantLock_ for detail stats.
+
+Log Handlers
+~~~~~~~~~~~~
+
+**{scope}/logsegments**
+
+All basic stats of log handlers are exposed under {scope}/logsegments. See BKLogHandler_
for detail stats.
+
+**{scope}/segments**
+
+The stats about write log handlers are exposed under {scope}/segments. See BKLogWriteHandler_
for detail stats.
+
+**{scope}/readhead_worker**
+
+The stats about read log handlers are exposed under {scope}/readahead_worker.
+See BKLogReadHandler_ for detail stats.
+
+Writers
+~~~~~~~
+
+All writer related metrics are exposed under {scope}/log_writer. See BKAsyncLogWriter_ for
detail stats.
+
+Readers
+~~~~~~~
+
+All reader related metrics are exposed under {scope}/async_reader. See BKAsyncLogReader_
for detail stats.
+
+BKDistributedLogNamespace
+-------------------------
+
+ZooKeeper Clients
+~~~~~~~~~~~~~~~~~
+
+There are various of zookeeper clients created per namespace for different purposes. They
are:
+
+**{scope}/dlzk_factory_writer_shared**
+
+Stats about the zookeeper client shared by all DL writers.
+
+**{scope}/dlzk_factory_reader_shared**
+
+Stats about the zookeeper client shared by all DL readers.
+
+**{scope}/bkzk_factory_writer_shared**
+
+Stats about the zookeeper client used by bookkeeper client that shared by all DL writers.
+
+**{scope}/bkzk_factory_reader_shared**
+
+Stats about the zookeeper client used by bookkeeper client that shared by all DL readers.
+
+See ZooKeeperClient_ for zookeeper detail stats.
+
+BookKeeper Clients
+~~~~~~~~~~~~~~~~~~
+
+All the bookkeeper client related stats are exposed directly to current {scope}. See BookKeeperClient_
+for detail stats.
+
+Utils
+~~~~~
+
+**{scope}/factory/thread_pool**
+
+Stats about the ordered scheduler used by this namespace. See OrderedScheduler_ for detail
stats.
+
+**{scope}/factory/readahead_thread_pool**
+
+Stats about the readahead thread pool executor used by this namespace. See MonitoredScheduledThreadPoolExecutor_
+for detail stats.
+
+**{scope}/writeLimiter**
+
+Stats about the global write limiter used by list namespace.
+
+DistributedLogManager
+~~~~~~~~~~~~~~~~~~~~~
+
+All the core stats about reader and writer are exposed under current {scope} via BKDistributedLogManager_.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/latest
----------------------------------------------------------------------
diff --git a/website/docs/latest b/website/docs/latest
new file mode 120000
index 0000000..92a7f82
--- /dev/null
+++ b/website/docs/latest
@@ -0,0 +1 @@
+../../docs
\ No newline at end of file



Mime
View raw message