pulsar-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [pulsar] ivankelly commented on a change in pull request #3844: [Doc]Fix typo and language issues in the faq.md file
Date Wed, 27 Mar 2019 11:08:46 GMT
ivankelly commented on a change in pull request #3844: [Doc]Fix typo and language issues in
the faq.md file
URL: https://github.com/apache/pulsar/pull/3844#discussion_r269504328
 
 

 ##########
 File path: faq.md
 ##########
 @@ -112,162 +111,136 @@ It’s a component that was introduced recently. Essentially it’s
a stateless
 Yes, you can split a given bundle manually.
 
 ### Is the producer kafka wrapper thread-safe?
-The producer wrapper should be thread-safe.
+The producer wrapper is thread-safe.
 
 ### Can I just remove a subscription?
-Yes, you can use the cli tool `bin/pulsar-admin persistent unsubscribe $TOPIC -s $SUBSCRIPTION`.
+Yes, you can remove a subscription by using the cli tool `bin/pulsar-admin persistent unsubscribe
$TOPIC -s $SUBSCRIPTION`.
 
-### How are subscription modes set? Can I create new subscriptions over the WebSocket API?
-Yes, you can set most of the producer/consumer configuration option in websocket, by passing
them as HTTP query parameters like:
+### How to set subscription modes? Can I create new subscriptions over the WebSocket API?
+Yes, you can set most of the producer/consumer configuration option in websocket, by passing
them as HTTP query parameters as follows:
 `ws://localhost:8080/ws/consumer/persistent/sample/standalone/ns1/my-topic/my-sub?subscriptionType=Shared`
 
-see [the doc](http://pulsar.apache.org/docs/latest/clients/WebSocket/#RunningtheWebSocketservice-1fhsvp).
+You can create new subscriptions over the WebSocket API. For details, see [Pulsar's WebSocket
API](http://pulsar.apache.org/docs/latest/clients/WebSocket/#RunningtheWebSocketservice-1fhsvp).
 
 ### Is there any sort of order of operations or best practices on the upgrade procedure for
a geo-replicated Pulsar cluster?
-In general, updating the Pulsar brokers is an easy operation, since the brokers don't have
local state. The typical rollout is a rolling upgrade, either doing 1 broker at a time or
some percentage of them in parallel.
+In general, it is easy to update the Pulsar brokers, since the brokers don't have local state.
The typical rollout is a rolling upgrade, either updating a broker at a time or some percentage
of brokers in parallel.
 
-There are not complicated requirements to upgrade geo-replicated clusters, since we take
particular care in ensuring backward and forward compatibility.
+There are no complicated requirements to upgrade geo-replicated clusters, we ensure backward
and forward compatibility.
 
-Both the client and the brokers are reporting their own protocol version and they're able
to disable newer features if the other side doesn't support them yet.
+Both clients and brokers report their own protocol versions, and they can disable newer features
if the other side does not support them yet.
 
-Additionally, when making metadata breaking format changes (if the need arises), we make
sure to spread the changes along at least 2 releases.
+Additionally, when making metadata breaking format changes (if the need arises), we make
sure to spread the changes along at least two releases.
 
-This is to always allow the possibility to downgrade a running cluster to a previous version,
in case any server problem is identified in production.
+So, when Pulsar cluster recognizes the new formats in a release, it starts using new formats
in the next release. 
 
-So, one release will understand the new format while the next one will actually start using
it.
+It is allowed to downgrade a running cluster to a previous version, in case any server problem
is identified in production.
 
-### Since Pulsar has configurable retention per namespace, can I set a "forever" value, ie.,
always keep all data in the namespaces?
-So, retention applies to "consumed" messages. Ones, for which the consumer has already acknowledged
the processing. By default, retention is 0, so it means data is deleted as soon as all consumers
acknowledge. You can set retention to delay the retention.
+### Since Pulsar has configurable retention per namespace, can I set a "forever" value, and
keep all data in the namespaces forever?
+Retention applies to "consumed" messages, for which the consumer has already acknowledged
the processing. By default, retention is set to 0, which means data is deleted as soon as
all consumers acknowledge. You can set retention to delay the retention.
 
-That also means, that data is kept forever, by default, if the consumers are not acknowledging.
+It also means that data is kept forever, by default, if the consumers are not acknowledging.
 
-There is no currently "infinite" retention, other than setting to very high value.
-
-### How can a consumer "replay" a topic from the beginning, ie., where can I set an offset
for the consumer?
+### How can a consumer "replay" a topic from the beginning? Where can I set an offset for
the consumer?
 1. Use admin API (or CLI tool):
     - Reset to a specific point in time (3h ago)
     - Reset to a message id
 2. You can use the client API `seek`.
 
-### When create a consumer, does this affect other consumers ?
-The key is that you should use different subscriptions for each consumer. Each subscription
is completely independent from others.
-
-### The default when creating a consumer, is it to "tail" from "now" on the topic, or from
the "last acknowledged" or something else?
-So when you spin up a consumer, it will try to subscribe to the topic, if the subscription
doesn't exist, a new one will be created, and it will be positioned at the end of the topic
("now"). 
-
-Once you reconnect, the subscription will still be there and it will be positioned on the
last acknowledged messages from the previous session.
-
-### I want some produce lock, i.e., to pessimistically or optimistically lock a specified
topic so only one producer can write at a time and all further producers know they have to
reprocess data before trying again to write a topic.
-To ensure only one producer is connected, you just need to use the same "producerName", the
broker will ensure that no 2 producers with same name are publishing on a given topic.
-
-### I tested the performance using PerformanceProducer between two server node with 10,000Mbits
NIC(and I tested tcp throughput can be larger than 1GB/s). I saw that the max msg throughput
is around 1000,000 msg/s when using little msg_size(such as 64/128Bytes), when I increased
the msg_size to 1028 or larger , then the msg/s will decreased sharply to 150,000msg/s, and
both has max throughput around   1600Mbit/s, which is far from 1GB/s.  And I'm curious that
the throughput between producer and broker why can't excess 1600Mbit/s ?  It seems that the
Producer executor only use one thread, is this the reason?Then I start two producer client
jvm, the throughput increased not much, just about little beyond 1600Mbit/s. Any other reasons?
-Most probably, when increasing the payload size, you're reaching the disk max write rate
on a single bookie.
-
-There are few tricks that can be used to increase throughput (other than just partitioning)
-
-1. Enable striping in BK, by setting ensemble to bigger than write quorum. E.g. e=5 w=2 a=2.
Write 2 copies of each message but stripe them across 5 bookies
-
-2. If there are already multiple topics/partitions, you can try to configure the bookies
with multiple journals (e.g. 4). This should increase the throughput when the journal is on
SSDs, since the controller has multiple IO queues and can efficiently sustain multiple threads
each doing sequential writes
-
-- Option (1) you just configure it on a given pulsar namespace, look at "namespaces set-persistence"
command
-
-- Option (2) needs to be configured in bookies
-
-### Is there any work on a Mesos Framework for Pulsar/Bookkeeper this point? Would this be
useful?
-We don’t have anything ready available for Mesos/DCOS though there should be nothing preventing
it
-
-It would surely be useful.
-
+### When creating a consumer, is the default set to "tail" from "now" on the topic, or from
the "last acknowledged" or something else?
+When you spin up a consumer, it tries to subscribe to the topic. If the subscription doesn't
exist, a new one is created, and it is positioned at the end of the topic ("now"). 
 
-### Is there an HDFS like interface?
-Not for Pulsar.There was some work in BK / DistributedLog community to have it but not at
the messaging layer.
+Once you reconnect, the subscription is still there and it is positioned on the last acknowledged
messages from the previous session.
 
-### Where can I find information about `receiveAsync` parameters? In particular, is there
a timeout as in `receive`?
-There’s no other info about `receiveAsync()`. The method doesn’t take any parameters.
Currently there’s no timeout on it. You can always set a timeout on the `CompletableFuture`
itself, but the problem is how to cancel the future and avoid “getting” the message.
+### I want some produce lock, i.e., to pessimistically or optimistically lock a specified
topic, so only one producer can write at a time and all further producers know that they have
to reprocess data before trying again to write a topic.
+To ensure only one producer is connected, you just need to use the same "producerName". The
broker ensures that no two producers with the same name are publishing on a given topic.
 
-What’s your use case for timeout on the `receiveAsync()`? Could that be achieved more easily
by using the `MessageListener`?
+### Is there any work on a Mesos Framework for Pulsar/Bookkeeper at this point? Would this
be useful?
+We don’t have anything available for Mesos/DCOS, yet nothing could prevent it.
+Surely, it is useful.
 
-### Why do we choose to use bookkeeper to store consumer offset instead of zookeeper? I mean
what's the benefits?
-ZooKeeper is a “consensus” system that while it exposes a key/value interface is not
meant to support a large volume of writes per second.
+### Where can I find information about `receiveAsync` parameters? In particular, is there
a timeout on `receive`?
+There’s no other information about the `receiveAsync()` method. The method doesn’t take
any parameters. Currently there’s no timeout on it. You can always set a timeout on the
`CompletableFuture` instance, but the problem is how to cancel the future and avoid “getting”
the message.
 
-ZK is not an “horizontally scalable” system, because every node receive every transaction
and keeps the whole data set. Effectively, ZK is based on a single “log” that is replicated
consistently across the participants. 
+### I'm facing some issues using `.receiveAsync` that it seems to be related with `UnAckedMessageTracker`
and `PartitionedConsumerImpl`. We are consuming messages with `receiveAsync`, doing instant
`acknowledgeAsync` when message is received, after that the process will delay the next execution
of itself. In such scenario we are consuming a lot more messages (repeated) than the number
of messages produced. We are using Partitioned topics with setAckTimeout 30 seconds and I
believe this issue could be related with `PartitionedConsumerImpl` because the same test in
a non-partitioned topic does not generate any repeated message.
+PartitionedConsumer is composed of a set of regular consumers, one per partition. To have
a single `receive()` abstraction, messages from all partitions are pushed into a shared queue.

 
-The max throughput we have observed on a well configured ZK on good hardware was around ~10K
writes/s. If you want to do more than that, you would have to shard it.. 
+The thing is that the unacked message tracker works at the partition level. So when timeout
happens, it’s able to request redelivery for the messages and clear them from the queue
when that happens. But if messages were already pushed into the shared queue, the “clearing”
part will not happen.
 
-To store consumers cursor positions, we need to write potentially a large number of updates
per second. Typically we persist the cursor every 1 second, though the rate is configurable
and if you want to reduce the amount of potential duplicates, you can increase the persistent
frequency.
-
-With BookKeeper it’s very efficient to have a large throughput across a huge number of
different “logs”. In our case, we use 1 log per cursor, and it becomes feasible to persist
every single cursor update.
-
-### I'm facing some issue using `.receiveAsync` that it seems to be related with `UnAckedMessageTracker`
and `PartitionedConsumerImpl`. We are consuming messages with `receiveAsync`, doing instant
`acknowledgeAsync` when message is received, after that the process will delay the next execution
of itself. In such scenario we are consuming a lot more messages (repeated) than the num of
messages produced. We are using Partitioned topics with setAckTimeout 30 seconds and I believe
this issue could be related with `PartitionedConsumerImpl` because the same test in a non-partitioned
topic does not generate any repeated message.
-PartitionedConsumer is composed of a set of regular consumers, one per partition. To have
a single `receive()` abstraction, messages from all partitions are then pushed into a shared
queue. 
-
-The thing is that the unacked message tracker works at the partition level.So when the timeout
happens, it’s able to request redelivery for the messages and clear them from the queue
when that happens,
-but if the messages were already pushed into the shared queue, the “clearing” part will
not happen.
-
-- the only quick workaround that I can think of is to increase the “ack-timeout” to a
level in which timeout doesn’t occur in processing
-- another option would be to reduce the receiver queue size, so that less messages are sitting
in the queue
+- a quick workaround is to increase the “ack-timeout” to a level in which timeout doesn’t
occur in processing
 
 Review comment:
   ```suggestion
   - A quick workaround is to increase the “ack-timeout” to a level in which timeout doesn’t
occur in processing.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message