pulsar-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [pulsar] ivankelly commented on a change in pull request #3844: [Doc]Fix typo and language issues in the faq.md file
Date Wed, 27 Mar 2019 11:08:45 GMT
ivankelly commented on a change in pull request #3844: [Doc]Fix typo and language issues in
the faq.md file
URL: https://github.com/apache/pulsar/pull/3844#discussion_r269498254
 
 

 ##########
 File path: faq.md
 ##########
 @@ -2,108 +2,107 @@
 - Getting Started
 - Concepts and Design
 - Usage and Configuration
+- Advanced Questions
 
 ---
 
 ## Getting Started
 
 ### What is the minimum requirements for Apache Pulsar ?
-You need 3 kind of clusters: bookie, broker, zookeeper. But if not have enough resource,
it's ok to run them on same machine.
-
+You need three kinds of clusters: bookie, broker, and zookeeper. If you do not have enough
resources, you can also run the three clusters on the same machine.
 ---
-
 ## Concepts and Design
 
 ### Is ack tied to subscription?
 Yes, ack is tied to a particular subscription.
 
-### Where should I look into to tweak load balancing ?
-There are few parameters to look at :
-1. The topic assignments to brokers are done in terms of “bundles”, that is in group
of topic
-2. Topics are matched to bundles by hashing on the name
-3. Effectively, a bundle is a hash-range where topics falls into
-4. Initially the default is to have 4 “bundles” for a namespace
-5. When the traffic increases on a given bundle, it will be split in 2 and reassigned to
a different broker
-6. There are some adjustable thresholds that can be used to control when the split happens,
based on number of topics/partitions, messages in/out, bytes in/out, etc..
-7. It’s also possible to specify a higher number of bundles when creating a namepsac
-8. There are the load-manager threshold that control when a broker should offload some of
the bundles to other brokers
+### Where do I look into to tweak load balancing ?
+There are a few parameters to look at :
+1. The topic assignments to brokers are done in terms of “bundles”, that is in group
of topic.
+2. Topics are matched to bundles by hashing on the name.
+3. A bundle is a hash-range where topics fall into.
+4. The default is to have four “bundles” for a namespace.
+5. When the traffic increases on a given bundle, it  is split in two and reassigned to a
different broker.
+6. There are some adjustable thresholds that can be used to control when the split happens,
based on the number of topics/partitions, messages in/out, bytes in/out, and so on.
+7. It’s also possible to specify a higher number of bundles when creating a namespace.
+8. The load-manager threshold controls when a broker should offload some of the bundles to
other brokers.
 
-### What is the lifecycle of subscription?
-Once it’s created, it retains all messages published after that (minus explicit TTL). Subscriptions
can be dropped by explicitly unsubscribing (in `Consumer` API) or through the REST/CLI .
+### What is the life-cycle of subscription?
+When a subscription is created, it retains all messages published after that (minus explicit
TTL). You can drop subscriptions by explicitly unsubscribing (in `Consumer` API) or through
the REST/CLI .
 
 ### What is a bundle?
-In Pulsar, "namespaces" are the administrative unit: you can configure most options on a
namespace and they will be applied on the topics contained on the namespace. It gives the
convenience of doing settings and operations on a group of topics rather than doing it once
per topic.
+In Pulsar, "namespace" is the administrative unit: you can configure most options on a namespace,
and the configuration is applied on the topics contained on the namespace. It is convenient
to configure settings and operations on a group of topics rather than doing it once per topic.
 
-In general, the pattern is to use a namespace for each user application. So a single user/tenant,
can create multiple namespaces to manage its own applications.
+In general, the pattern is to use a namespace for each user application. So a single user/tenant
can create multiple namespaces to manage its own applications.
 
-When it comes to topics, we need a way to assign topics to brokers, control the load and
move them if a broker becomes overloaded. Rather that doing this operations per each single
topic (ownership, load-monitoring, assigning), we do it in bundles, or "groups of topics".
+Concerning topics, topics are assigned to brokers, control the load and move them if a broker
becomes overloaded. Rather than doing these operations per each single topic (ownership, load-monitoring,
assigning), we do it in bundles, or "groups of topics".
 
-In practical words, the number of bundles determines "into how many brokers can I spread
the topics for a given namespace".
+The number of bundles determines the number of brokers you can spread the topics into for
a given namespace.
 
-From the client API or implementation, there's no concept of bundles, clients will lookup
the topics that want to publish/consumer individually.
+From the perspective of client API or implementation, there is no concept of bundles. Clients
look up the topics that they want to publish or consume individually.
 
-On the broker side, the namespace is broken down into multiple bundles, and each bundle can
be assigned to a different broker. Effectively, bundles are the "unit of assignment" for topics
into brokers and this is what the load-manager uses to track the traffic and decide where
to place "bundles" and whether to offload them to other brokers.
+On the broker side, the namespace is broken down into multiple bundles, and each bundle is
assigned to a different broker. Effectively, bundle is the "unit of assignment" for topics
into brokers and this is what the load-manager uses to track the traffic and decide where
to place "bundles" and whether to offload them to other brokers.
 
 A bundle is represented by a hash-range. The 32-bit hash space is initially divided equally
into the requested bundles. Topics are matched to a bundle by hashing on the topic name.
 
-Default number of bundles is configured in `broker.conf`: `defaultNumberOfNamespaceBundles=4`
+The default number of bundles is configured in `broker.conf`: `defaultNumberOfNamespaceBundles=4`
 
-When the traffic increases on a given bundle, it will be split in 2 and reassigned to a different
broker.
+When the traffic increases on a given bundle, it is split into two and reassigned to a different
broker.
 
-Enable auto-split: `loadBalancerAutoBundleSplitEnable=true` trigger unload and reassignment
after splitting: `loadBalancerAutoUnloadSplitsEnable=true`.
+If you want to enable auto-split, set the parameter as `loadBalancerAutoBundleSplitEnable=true`.
If you want to trigger unload and reassignment after splitting, set the parameter as `loadBalancerAutoUnloadSplitsEnable=true`.
 
-If is expected to have a high traffic on a particular namespace, it's a good practice to
specify a higher number of bundles when creating the namespace: `bin/pulsar-admin namespaces
create $NS --bundles 64`. This will avoid the initial auto-adjustment phase.
+If you want to have a high traffic on a particular namespace, it's a good practice to specify
a higher number of bundles when creating the namespace: `bin/pulsar-admin namespaces create
$NS --bundles 64`. This avoids the initial auto-adjustment phase.
 
-All the thresholds for the auto-splitting can be configured in `broker.conf`, eg: number
of topics/partitions, messages in/out, bytes in/out, etc...
+You can configure all the thresholds for auto-splitting in the `broker.conf` file. For example,
you can configure the number of topics and partitions, messages in/out, bytes in/out, and
so on.
 
-### How the design deals with isolation between tenants, which concepts enable that and up
to what extent, how huge difference can exist between tenants so that impact on each other
is noticeable via degraded latency.
-The isolation between tenants (and topics of same tenant) happens at many different points.
I'll start from the bottom up.
+### How does the design deal with isolation between tenants? Which concepts enable that and
up to what extent? How huge differences can exist between tenants so that impact on each other
is noticeable via degraded latency?
+The isolation between tenants (and topics of the same tenant) happens at many different points.
I'll start from the bottom up.
 
 #### Storage
-You're probably familiar with BookKeeper, but of the main strength is that each bookie can
efficiently serve many different ledger (segments of topic data). We tested with 100s of thousand
per single node.
+You're probably familiar with BookKeeper. The main strength is that each bookie can efficiently
serve many different ledgers (segments of topic data). We have tested with 100s of thousand
per single node. 
 
-This is because there is a single journal (on its own device) where all the write operations
gets appended and then the entries are periodically flushed in background on the storage device.
+This is because there is a single journal (on its own device) where all the write operations
get appended and then the entries are periodically flushed in background on the storage device.
 
-This gives isolation between writes and reads in a bookie. You can read as fast as you can,
maxing out the IO on the storage device, but your write throughput and latency are going to
be unaffected.
+This gives isolation between writes and reads in a bookie. You can read as fast as you can,
maxing out the IO on the storage device, and your write throughput and latency are unaffected.
 
 #### Broker
 Everything in the broker happens asynchronously. The amount of memory that is used is also
capped per broker.
 
-Whenever the broker is marked as overloaded, traffic can be quickly shifted (manually or
without intervention) to less loaded brokers. LoadManager component in brokers is dedicated
to that.
+Whenever a broker is marked as overloaded, traffic is quickly shifted (manually or without
intervention) to less loaded brokers. The LoadManager component in brokers is dedicated to
that.
 
 There are several points of flow control:
-- On the producer side, there are limits on the in-flight message for broker bookies, that
will slow down users trying to publish faster that the system can absorb
-- On the consumer side, it's possible to throttle the delivery to a certain rate
+- On the producer side, there are limits on the in-flight message for broker bookies, which
controls users' speed in publishing messages, so that the speed is not faster than the system
can absorb. 
+- On the consumer side, it's possible to throttle the delivery to a certain rate.
 
 #### Quotas
-Can configure different storage quotas for different tenants/namespaces and take different
actions when the quotas are filled up (block producer, give exception, drop older messages).
+You can configure different storage quotas for different tenants/namespaces, and take different
actions(block producer, give exception, drop older messages) when the quotas are filled up.
 
 #### Broker level isolation
-There is the option to isolate certain tenants/namespaces to a particular set of broker.
Typically the reason for using that was to experiment with different configurations, debugging
and quickly react to unexpected situations.
+There is an option to isolate certain tenants/namespaces to a particular set of broker. Typically,
you use the option when you are to experiment with different configurations, debug or quickly
react to unexpected situations.
 
-For example, a particular user might be triggering a bad behavior in the broker that can
impact performance for other tenants.
+For example, a particular user might be triggering a bad behavior in the broker that impacts
performance for other tenants.
 
-In this case, the particular user can be "isolated" a subset of brokers that will not serve
any other traffic, until a proper fix that correctly handles the condition can be deployed.
+In this case, the particular user is "isolated" to a subset of brokers that do not serve
any other traffic, until a proper fix that correctly handles the condition is deployed.
 
 Review comment:
   Again, "can be" is correct as it's describing a hypothetical scenario

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message