pulsar-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] sijie closed pull request #2275: [website] improve pulsar connector documentation
Date Thu, 02 Aug 2018 10:11:45 GMT
sijie closed pull request #2275:  [website] improve pulsar connector documentation
URL: https://github.com/apache/incubator-pulsar/pull/2275
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/site2/docs/deploy-bare-metal-multi-cluster.md b/site2/docs/deploy-bare-metal-multi-cluster.md
index 2b94aa79d8..e554ee885a 100644
--- a/site2/docs/deploy-bare-metal-multi-cluster.md
+++ b/site2/docs/deploy-bare-metal-multi-cluster.md
@@ -4,7 +4,15 @@ title: Deploying a multi-cluster on bare metal
 sidebar_label: Bare metal multi-cluster
 ---
 
-> Single-cluster Pulsar installations should be sufficient for all but the most ambitious use cases. If you're interested in experimenting with Pulsar or using it in a startup or on a single team, we recommend opting for a single cluster. For instructions on deploying a single cluster, see the guide [here](deploy-bare-metal.md).
+> ### Tips
+>
+> 1. Single-cluster Pulsar installations should be sufficient for all but the most ambitious use cases. If you're interested in experimenting with
+> Pulsar or using it in a startup or on a single team, we recommend opting for a single cluster. For instructions on deploying a single cluster,
+> see the guide [here](deploy-bare-metal.md).
+>
+> 2. If you want to use all builtin [Pulsar IO](io-overview.md) connectors in your Pulsar deployment, you need to download `apache-pulsar-io-connectors`
+> package and make sure it is installed under `connectors` directory in the pulsar directory on every broker node or on every function-worker node if you
+> have run a separate cluster of function workers for [Pulsar Functions](functions-overview.md).
 
 A Pulsar *instance* consists of multiple Pulsar clusters working in unison. Clusters can be distributed across data centers or geographical regions and can replicate amongst themselves using [geo-replication](administration-geo.md). Deploying a multi-cluster Pulsar instance involves the following basic steps:
 
diff --git a/site2/docs/deploy-bare-metal.md b/site2/docs/deploy-bare-metal.md
index 0c9236b5b7..593a71bf36 100644
--- a/site2/docs/deploy-bare-metal.md
+++ b/site2/docs/deploy-bare-metal.md
@@ -4,7 +4,16 @@ title: Deploying a cluster on bare metal
 sidebar_label: Bare metal
 ---
 
-> Single-cluster Pulsar installations should be sufficient for all but the most ambitious use cases. If you're interested in experimenting with Pulsar or using it in a startup or on a single team, we recommend opting for a single cluster. If you do need to run a multi-cluster Pulsar instance, however, see the guide [here](deploy-bare-metal-multi-cluster.md).
+
+> ### Tips
+>
+> 1. Single-cluster Pulsar installations should be sufficient for all but the most ambitious use cases. If you're interested in experimenting with
+> Pulsar or using it in a startup or on a single team, we recommend opting for a single cluster. If you do need to run a multi-cluster Pulsar instance,
+> however, see the guide [here](deploy-bare-metal-multi-cluster.md).
+>
+> 2. If you want to use all builtin [Pulsar IO](io-overview.md) connectors in your Pulsar deployment, you need to download `apache-pulsar-io-connectors`
+> package and make sure it is installed under `connectors` directory in the pulsar directory on every broker node or on every function-worker node if you
+> have run a separate cluster of function workers for [Pulsar Functions](functions-overview.md).
 
 Deploying a Pulsar cluster involves doing the following (in order):
 
diff --git a/site2/docs/deploy-dcos.md b/site2/docs/deploy-dcos.md
index 6d6f53f6f3..5f66939085 100644
--- a/site2/docs/deploy-dcos.md
+++ b/site2/docs/deploy-dcos.md
@@ -4,6 +4,11 @@ title: Deploying Pulsar on DC/OS
 sidebar_label: DC/OS
 ---
 
+> ### Tips
+>
+> If you want to enable all builtin [Pulsar IO](io-overview.md) connectors in your Pulsar deployment, you can choose to use `apachepulsar/pulsar-all` image instead of
+> `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled [all builtin connectors](io-overview.md#working-with-connectors).
+
 [DC/OS](https://dcos.io/) (the <strong>D</strong>ata<strong>C</strong>enter <strong>O</strong>perating <strong>S</strong>ystem) is a distributed operating system used for deploying and managing applications and systems on [Apache Mesos](http://mesos.apache.org/). DC/OS is an open-source tool created and maintained by [Mesosphere](https://mesosphere.com/).
 
 Apache Pulsar is available as a [Marathon Application Group](https://mesosphere.github.io/marathon/docs/application-groups.html), which runs multiple applications as manageable sets.
diff --git a/site2/docs/deploy-kubernetes.md b/site2/docs/deploy-kubernetes.md
index 769afa4ac7..037d28de07 100644
--- a/site2/docs/deploy-kubernetes.md
+++ b/site2/docs/deploy-kubernetes.md
@@ -4,6 +4,11 @@ title: Deploying Pulsar on Kubernetes
 sidebar_label: Kubernetes
 ---
 
+> ### Tips
+>
+> If you want to enable all builtin [Pulsar IO](io-overview.md) connectors in your Pulsar deployment, you can choose to use `apachepulsar/pulsar-all` image instead of
+> `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled [all builtin connectors](io-overview.md#working-with-connectors).
+
 Pulsar can be easily deployed in [Kubernetes](https://kubernetes.io/) clusters, either in managed clusters on [Google Kubernetes Engine](#pulsar-on-google-kubernetes-engine) or [Amazon Web Services](https://aws.amazon.com/) or in [custom clusters](#pulsar-on-a-custom-kubernetes-cluster).
 
 The deployment method shown in this guide relies on [YAML](http://yaml.org/) definitions for Kubernetes [resources](https://kubernetes.io/docs/reference/). The {@inject: github:`kubernetes`:/kubernetes} subdirectory of the [Pulsar package](pulsar:download_page_url) holds resource definitions for:
diff --git a/site2/docs/getting-started-standalone.md b/site2/docs/getting-started-standalone.md
index 36aa68f3eb..40904512d0 100644
--- a/site2/docs/getting-started-standalone.md
+++ b/site2/docs/getting-started-standalone.md
@@ -58,7 +58,51 @@ Directory | Contains
 `logs` | Logs created by the installation
 
 
+## Installing Builtin Connectors
 
+Since release `2.1.0-incubating`, Pulsar releases a separate binary distribution, containing all the `builtin` connectors.
+If you would like to enable those `builtin` connectors, you can download the connectors tarball release in one of the following ways:
+
+* by clicking the link below and downloading the release from an Apache mirror:
+
+  * <a href="pulsar:connector_release_url" download>Pulsar IO Connectors {{pulsar:version}} release</a>
+
+* from the Pulsar [downloads page](pulsar:download_page_url)
+* from the Pulsar [releases page](https://github.com/apache/incubator-pulsar/releases/latest)
+* using [wget](https://www.gnu.org/software/wget):
+
+  ```shell
+  $ wget pulsar:connector_release_url
+  ```
+
+Once the tarball is downloaded, in the pulsar directory, untar the io-connectors package and copy the connectors as `connectors`
+in the pulsar directory:
+
+```bash
+$ tar xvfz /path/to/apache-pulsar-io-connectors-{{pulsar:version}}-bin.tar.gz
+
+// you will find a directory named `apache-pulsar-io-connectors-{{pulsar:version}}` in the pulsar directory
+// then copy the connectors
+
+$ cd apache-pulsar-io-connectors-{{pulsar:version}}/connectors connectors
+
+$ ls connectors
+pulsar-io-aerospike-{{pulsar.version}}.nar
+pulsar-io-cassandra-{{pulsar.version}}.nar 
+pulsar-io-kafka-{{pulsar.version}}.nar     
+pulsar-io-kinesis-{{pulsar.version}}.nar   
+pulsar-io-rabbitmq-{{pulsar.version}}.nar  
+pulsar-io-twitter-{{pulsar.version}}.nar
+...
+```
+
+> #### NOTES
+>
+> If you are running Pulsar in a bare mental cluster, you need to make sure `connectors` tarball is unzipped in every broker's pulsar directory
+> (or in every function-worker's pulsar directory if you are running a separate worker cluster for Pulsar functions).
+> 
+> If you are [running Pulsar in Docker](getting-started-docker.md) or deploying Pulsar using a docker image (e.g. [K8S](deploy-kubernetes.md) or [DCOS](deploy-dcos.md)),
+> you can use `apachepulsar/pulsar-all` image instead of `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled [all builtin connectors](io-overview.md#working-with-connectors).
 
 ## Starting the cluster
 
diff --git a/site2/docs/io-aerospike.md b/site2/docs/io-aerospike.md
new file mode 100644
index 0000000000..b23e2e30c4
--- /dev/null
+++ b/site2/docs/io-aerospike.md
@@ -0,0 +1,21 @@
+---
+id: io-aerospike
+title: Aerospike Sink Connector
+sidebar_label: Aerospike Sink Connector
+---
+
+The Aerospike Sink connector is used to write messages to an Aerospike Cluster.
+
+## Sink Configuration Options
+
+The following configuration options are specific to the Aerospike Connector:
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| `seedHosts` | `true` | `null` | Comma seperated list of one or more Aerospike cluster hosts; each host can be specified as a valid IP address or hostname followed by an optional port number (default is 3000). | 
+| `keyspace` | `true` | `null` | Aerospike namespace to use. |
+| `keySet` | `false` | `null` | Aerospike set name to use. |
+| `columnName` | `true` | `null` | Aerospike bin name to use. |
+| `maxConcurrentRequests` | `false` | `100` | Maximum number of concurrent Aerospike transactions that a Sink can open. |
+| `timeoutMs` | `false` | `100` | A single timeout value controls `socketTimeout` and `totalTimeout` for Aerospike transactions.  |
+| `retries` | `false` | `1` | Maximum number of retries before aborting a write transaction to Aerospike. |
diff --git a/site2/docs/io-cassandra.md b/site2/docs/io-cassandra.md
new file mode 100644
index 0000000000..358a8eee7c
--- /dev/null
+++ b/site2/docs/io-cassandra.md
@@ -0,0 +1,22 @@
+---
+id: io-cassandra
+title: Cassandra Sink Connector
+sidebar_label: Cassandra Sink Connector
+---
+
+The Cassandra Sink connector is used to write messages to a Cassandra Cluster.
+
+The tutorial [Connecting Pulsar with Apache Cassandra](io-quickstart.md) shows an example how to use Cassandra Sink
+connector to write messages to a Cassandra table.
+
+## Sink Configuration Options
+
+All the Cassandra sink settings are listed as below. All the settings are required to run a Cassandra sink.
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `roots` | `null` | `true` | Cassandra Contact Points. A list of one or many node address. It is a comma separated `String`. |
+| `keyspace` | `null` | `true` | Cassandra Keyspace name. The keyspace should be created prior to creating the sink. |
+| `columnFamily` | `null` | `true` | Cassandra ColumnFamily name. The column family should be created prior to creating the sink. |
+| `keyname` | `null` | `true` | Key column name. The key column is used for storing Pulsar message keys. If a Pulsar message doesn't have any key associated, the message value will be used as the key. |
+| `columnName` | `null` | `true` | Value column name. The value column is used for storing Pulsar message values. |
diff --git a/site2/docs/io-connectors.md b/site2/docs/io-connectors.md
new file mode 100644
index 0000000000..5a7699898a
--- /dev/null
+++ b/site2/docs/io-connectors.md
@@ -0,0 +1,18 @@
+---
+id: io-connectors
+title: Builtin Connectors
+sidebar_label: Builtin Connectors
+---
+
+Pulsar distribution includes a set of common connectors that have been packaged and tested with the rest of Apache Pulsar.
+These connectors import and export data from some of the most commonly used data systems. Using any these connectors is
+as easy as writing a simple connector configuration and running the connector locally or submitting the connector to a
+Pulsar Functions cluster.
+
+- [Aerospike Sink Connector](io-aerospike.md)
+- [Cassandra Sink Connector](io-cassandra.md)
+- [Kafka Sink Connector](io-kafka.md#sink)
+- [Kafka Source Connector](io-kafka.md#source)
+- [Kinesis Sink Connector](io-kinesis.md#sink)
+- [RabbitMQ Source Connector](io-rabbitmq.md#source)
+- [Twitter Firehose Source Connector](io-twitter.md)
diff --git a/site2/docs/io-develop.md b/site2/docs/io-develop.md
new file mode 100644
index 0000000000..72a75b8c75
--- /dev/null
+++ b/site2/docs/io-develop.md
@@ -0,0 +1,195 @@
+---
+id: io-develop
+title: Develop Connectors
+sidebar_label: Developing Connectors
+---
+
+This guide describes how developers can write new connectors for Pulsar IO to move data
+between Pulsar and other systems. It describes how to create a Pulsar IO connector.
+
+Pulsar IO connectors are specialized [Pulsar Functions](functions-overview.md). So writing
+a Pulsar IO connector is as simple as writing a Pulsar function. Pulsar IO connectors come
+in two flavors: {@inject: github:`Source`:/pulsar-io/core/src/main/java/org/apache/pulsar/io/core/Source.java},
+which import data from another system, and {@inject: github:`Sink`:/pulsar-io/core/src/main/java/org/apache/pulsar/io/core/Sink.java},
+which export data to another system. For example, [KinesisSink](io-kinesis.md) would export
+the messages of a Pulsar topic to a Kinesis stream, and [RabbitmqSource](io-rabbitmq.md) would import
+the messages of a RabbitMQ queue to a Pulsar topic.
+
+### Developing
+
+#### Develop a source connector
+
+What you need to develop a source connector is to implement {@inject: github:`Source`:/pulsar-io/core/src/main/java/org/apache/pulsar/io/core/Source.java}
+interface.
+
+First, you need to implement the {@inject: github:`open`:/pulsar-io/core/src/main/java/org/apache/pulsar/io/core/Source.java#L33} method. This method will be called once when the source connector
+is initialized. In this method, you can retrieve all the connector specific settings through
+the passed `config` parameter, and initialize all the necessary resourcess. For example, a Kafka
+connector can create the Kafka client in this `open` method.
+
+Beside the passed-in `config` object, the Pulsar runtime also provides a `SourceContext` for the
+connector to access runtime resources for tasks like collecting metrics. The implementation can
+save the `SourceContext` for futher usage.
+
+```java
+    /**
+     * Open connector with configuration
+     *
+     * @param config initialization config
+     * @param sourceContext
+     * @throws Exception IO type exceptions when opening a connector
+     */
+    void open(final Map<String, Object> config, SourceContext sourceContext) throws Exception;
+```
+
+The main task for a Source implementor is to implement {@inject: github:`read`:/master/pulsar-io/core/src/main/java/org/apache/pulsar/io/core/Source.java#L41}
+method.
+
+```java
+    /**
+     * Reads the next message from source.
+     * If source does not have any new messages, this call should block.
+     * @return next message from source.  The return result should never be null
+     * @throws Exception
+     */
+    Record<T> read() throws Exception;
+```
+
+The implementation should be blocking on this method if nothing to return. It should never return
+`null`. The returned {@inject: github:`Record`:/master/pulsar-functions/api-java/src/main/java/org/apache/pulsar/functions/api/Record.java#L28} should encapsulates the information that is needed by
+Pulsar IO runtime.
+
+These information includes:
+
+- *Topic Name*: _Optional_. If the record is originated from a Pulsar topic, it should be the Pulsar topic name.
+- *Key*: _Optional_. If the record has a key associated with it.
+- *Value*: _Required_. The actual data of this record.
+- *Partition Id*: _Optional_. If the record is originated from a partitioned source,
+  return its partition id. The partition id will be used as part of the unique identifier
+  by Pulsar IO runtime to do message deduplication and achieve exactly-once processing guarantee.
+- *Record Sequence*: _Optional_. If the record is originated from a sequential source,
+  return its record sequence. The record sequence will be used as part of the unique identifier
+  by Pulsar IO runtime to do message deduplication and achieve exactly-once processing guarantee.
+- *Properties*: _Optional_. If the record carries user-defined properties, return those properties.
+
+Additionally, the implemention of the record should provide two methods: `ack` and `fail`. These
+two methods will be used by Pulsar IO connector to acknowledge the records that it has done
+processing and fail the records that it has failed to process.
+
+{@inject: github:`KafkaSource`:/master/pulsar-io/kafka/src/main/java/org/apache/pulsar/io/kafka/KafkaAbstractSource.java} is a good example to follow.
+
+#### Develop a sink connector
+
+Developing a sink connector is as easy as developing a source connector. You just need to
+implement {@inject: github:`Sink`:/master/pulsar-io/core/src/main/java/org/apache/pulsar/io/core/Sink.java} interface.
+
+Similarly, you first need to implement the {@inject: github:`open`:/master/pulsar-io/core/src/main/java/org/apache/pulsar/io/core/Sink.java#L36} method to initialize all the necessary resources
+before implementing the {@inject: github:`write`:/pulsar-io/core/src/main/java/org/apache/pulsar/io/core/Sink.java#L44} method.
+
+```java
+    /**
+     * Open connector with configuration
+     *
+     * @param config initialization config
+     * @param sinkContext
+     * @throws Exception IO type exceptions when opening a connector
+     */
+    void open(final Map<String, Object> config, SinkContext sinkContext) throws Exception;
+```
+
+The main task for a Sink implementor is to implement {@inject: github:`write`:/pulsar-io/core/src/main/java/org/apache/pulsar/io/core/Sink.java#L44} method.
+
+```java
+    /**
+     * Write a message to Sink
+     * @param inputRecordContext Context of input record from the source
+     * @param record record to write to sink
+     * @throws Exception
+     */
+    void write(Record<T> record) throws Exception;
+```
+
+In the implemention of `write` method, the implementor can decide how to write the value and
+the optional key to the actual source, and leverage all the provided information such as
+`Partition Id`, `Record Sequence` for achieving different processing guarantees. The implementor
+is also responsible for acknowledging records if it has successfully written them or failing
+records if has failed to write them.
+
+### Testing
+
+Testing connectors can be challenging because Pulsar IO connectors interact with two systems
+that may be difficult to mock - Pulsar and the system the connector is connecting to. It is
+recommended to write very specificially test the functionalities of the connector classes
+while mocking the external services.
+
+Once you have written sufficient unit tests for your connector, we also recommend adding
+separate integration tests to verify end-to-end functionality. In Pulsar, we are using
+[testcontainers](https://www.testcontainers.org/) for all Pulsar integration tests. Pulsar IO
+{@inject: github:`IntegrationTests`:/tests/integration/src/test/java/org/apache/pulsar/tests/integration/io} are good examples to follow on integration testing your connectors.
+
+### Packaging
+
+Once you've developed and tested your connector, you must package it so that it can be submitted
+to a [Pulsar Functions](functions-overview.md) cluster. There are two approaches described
+here work with Pulsar Functions' runtime.
+
+If you plan to package and distribute your connector for others to use, you are obligated to
+properly license and copyright your own code and to adhere to the licensing and copyrights of
+all libraries your code uses and that you include in your distribution. If you are using the
+approach described in ["Creating a NAR package"](#creating-a-nar-package), the NAR plugin will
+automatically create a `DEPENDENCIES` file in the generated NAR package, including the proper
+licensing and copyrights of all libraries of your connector.
+
+#### Creating a NAR package
+
+The easiest approach to packaging a Pulsar IO connector is to create a NAR package using
+[nifi-nar-maven-plugin](https://mvnrepository.com/artifact/org.apache.nifi/nifi-nar-maven-plugin).
+
+NAR stands for NiFi Archive. It is a custom packaging mechanism used by Apache NiFi, to provide
+a bit of Java ClassLoader isolation. For more details, you can read this
+[blog post](https://medium.com/hashmapinc/nifi-nar-files-explained-14113f7796fd) to understand
+how NAR works. Pulsar uses the same mechanism for packaging all the [builtin connectors](io-connectors).
+
+All what you need is to include this [nifi-nar-maven-plugin](https://mvnrepository.com/artifact/org.apache.nifi/nifi-nar-maven-plugin) in your maven project for your connector. For example:
+
+```xml
+<plugins>
+  <plugin>
+    <groupId>org.apache.nifi</groupId>
+    <artifactId>nifi-nar-maven-plugin</artifactId>
+    <version>1.2.0</version>
+  </plugin>
+</plugins>
+```
+
+The {@inject: github:`TwitterFirehose`:/pulsar-io/twitter} connector is a good example to follow.
+
+#### Creating an Uber JAR
+
+An alternative approach is to create an _uber JAR_ that contains all of the connector's JAR files
+and other resource files. No directory internal structure is necessary.
+
+You can use [maven-shade-plugin](https://maven.apache.org/plugins/maven-shade-plugin/examples/includes-excludes.html) to create a Uber JAR. For example:
+
+```xml
+<plugin>
+  <groupId>org.apache.maven.plugins</groupId>
+  <artifactId>maven-shade-plugin</artifactId>
+  <version>3.1.1</version>
+  <executions>
+    <execution>
+      <phase>package</phase>
+      <goals>
+        <goal>shade</goal>
+      </goals>
+    </execution>
+    <configuration>
+      <filters>
+        <filter>
+          <artifact>*:*</artifact>
+        </filter>
+      </filters>
+    </configuration>
+  </executions>
+</plugin>
+```
diff --git a/site2/docs/io-kafka.md b/site2/docs/io-kafka.md
new file mode 100644
index 0000000000..2ae53be969
--- /dev/null
+++ b/site2/docs/io-kafka.md
@@ -0,0 +1,40 @@
+---
+id: io-kafka
+title: Kafka Connector
+sidebar_label: Kafka Connector
+---
+
+## Source
+
+The Kafka Source Connector is used to pull messages from Kafka topics and persist the messages
+to a Pulsar topic.
+
+### Source Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| bootstrapServers | `true` | `null` | A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. |
+| groupId | `true` | `null` | A unique string that identifies the consumer group this consumer belongs to. |
+| fetchMinBytes | `false` | `null` | Minimum bytes expected for each fetch response. |
+| autoCommitEnabled | `false` | `false` | If true, periodically commit to ZooKeeper the offset of messages already fetched by the consumer. This committed offset will be used when the process fails as the position from which the new consumer will begin. | 
+| autoCommitIntervalMs | `false` | `null` | The frequency in ms that the consumer offsets are committed to zookeeper. |
+| sessionTimeoutMs | `false` | `null` | The timeout used to detect consumer failures when using Kafka's group management facility. |
+| topic | `true` | `null` | Topic name to receive records from Kafka |
+| keySerializerClass | false | org.apache.kafka.common.serialization.StringSerializer | Serializer class for key that implements the org.apache.kafka.common.serialization.Serializer interface. |
+| valueSerializerClass | false | org.apache.kafka.common.serialization.StringSerializer | Serializer class for value that implements the org.apache.kafka.common.serialization.Serializer interface. |
+
+## Sink
+
+The Kafka Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to a Kafka topic.
+
+### Sink Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| acks | `true` | `null` | The kafka producer acks mode |
+| batchSize | `true` | `null` | The kafka producer batch size. |
+| maxRequestSize | `true` | `null` | The maximum size of a request in bytes. |
+| topic | `true` | `null` | Topic name to receive records from Kafka |
+| keySerializerClass | false | org.apache.kafka.common.serialization.StringSerializer | Serializer class for value that implements the org.apache.kafka.common.serialization.Serializer interface. |
+| valueSerializerClass | false | org.apache.kafka.common.serialization.StringSerializer | Serializer class for value that implements the org.apache.kafka.common.serialization.Serializer interface. |
diff --git a/site2/docs/io-kinesis.md b/site2/docs/io-kinesis.md
new file mode 100644
index 0000000000..e3eef08cf6
--- /dev/null
+++ b/site2/docs/io-kinesis.md
@@ -0,0 +1,36 @@
+---
+id: io-kinesis
+title: AWS Kinesis Connector
+sidebar_label: AWS Kinesis Connector
+---
+
+## Sink
+
+The Kinesis Sink connector is used to pull data from Pulsar topics and persist the data into
+AWS Kinesis.
+
+### Sink Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| awsEndpoint | `true` | null | kinesis end-point url can be found at : https://docs.aws.amazon.com/general/latest/gr/rande.html |
+| awsRegion | `true` | null | appropriate aws region eg: us-west-1, us-west-2 |
+| awsKinesisStreamName | `true` | null | kinesis stream name |
+| awsCredentialPluginName | `false` | null | Fully-Qualified class name of implementation of {@inject: github:`AwsCredentialProviderPlugin`:/pulsar-io/kinesis/src/main/java/org/apache/pulsar/io/kinesis/AwsCredentialProviderPlugin.java}. It is a factory class which creates an AWSCredentialsProvider that will be used by Kinesis Sink. If it is empty then KinesisSink will create a default AWSCredentialsProvider which accepts json-map of credentials in `awsCredentialPluginParam` | 
+| awsCredentialPluginParam | `false` | null | json-parameters to initialize `AwsCredentialsProviderPlugin` |
+| messageFormat | `true` | `ONLY_RAW_PAYLOAD` | Message format in which kinesis sink converts pulsar messages and publishes to kinesis streams |
+
+### Message Formats
+
+The available message formats are listed as below:
+
+#### **ONLY_RAW_PAYLOAD**
+
+Kinesis sink directly publishes pulsar message payload as a message into the configured kinesis stream.
+#### **FULL_MESSAGE_IN_JSON**
+
+Kinesis sink creates a json payload with pulsar message payload, properties and encryptionCtx, and publishes json payload into the configured kinesis stream.
+
+#### **FULL_MESSAGE_IN_FB**
+
+Kinesis sink creates a flatbuffer serialized paylaod with pulsar message payload, properties and encryptionCtx, and publishes flatbuffer payload into the configured kinesis stream.
diff --git a/site2/docs/io-managing.md b/site2/docs/io-managing.md
new file mode 100644
index 0000000000..5807924bc2
--- /dev/null
+++ b/site2/docs/io-managing.md
@@ -0,0 +1,161 @@
+---
+id: io-managing
+title: Managing Connectors
+sidebar_label: Managing Connectors
+---
+
+This section describes how to manage Pulsar IO connectors in a Pulsar cluster. You will learn how to:
+
+- Deploy builtin connectors
+- Monitor and update running connectors with Pulsar Admin CLI
+- Deploy customized connectors
+- Upgrade a connector
+
+## Using Builtin Connectors
+
+Pulsar bundles several [builtin connectors](io-overview.md#working-with-connectors) that should be used for moving data in and out
+of commonly used systems such as databases, messaging systems. Getting set up to use these builtin connectors is simple. You can follow
+the [instructions](getting-started-standalone.md#installing-builtin-connectors) on installing builtin connectors. After setup, all
+the builtin connectors will be automatically discovered by Pulsar brokers (or function-workers), so no additional installation steps are
+required.
+
+## Configuring Connectors
+
+Configuring Pulsar IO connectors is straightforward. What you need to do is to provide a yaml configuration file when your [run connectors](#running-connectors).
+The yaml configuration file basically tells Pulsar where to locate the sources and sinks and how to connect those sources and sinks with Pulsar topics.
+
+Below is an example yaml configuration file for Cassandra Sink:
+
+```shell
+tenant: public
+namespace: default
+name: cassandra-test-sink
+...
+# cassandra specific config
+configs:
+    roots: "localhost:9042"
+    keyspace: "pulsar_test_keyspace"
+    columnFamily: "pulsar_test_table"
+    keyname: "key"
+    columnName: "col"
+```
+
+The example yaml basically tells Pulsar which Cassandra cluster to connect, what is the `keyspace` and `columnFamily` to be used in Cassandra for collecting data,
+and how to map a Pulsar message into Cassandra table key and columns.
+
+For details, consult the documentation for [individual connectors](io-overview.md#working-with-connectors).
+
+## Running Connectors
+
+Pulsar connectors can be managed using the [`source`](reference-pulsar-admin.md#source) and [`sink`](reference-pulsar-admin.md#sink) commands of the [`pulsar-admin`](reference-pulsar-admin.md) CLI tool.
+
+### Running sources
+
+You can submit a source to be run in an existing Pulsar cluster using a command of this form:
+
+```bash
+$ ./bin/pulsar-admin source create --className  <classname> --jar <jar-location> --tenant <tenant> --namespace <namespace> --name <source-name> --destinationTopicName <output-topic>
+```
+
+Here’s an example command:
+
+```bash
+bin/pulsar-admin source create --className org.apache.pulsar.io.twitter.TwitterFireHose --jar ~/application.jar --tenant test --namespace ns1 --name twitter-source --destinationTopicName twitter_data
+```
+
+Instead of submitting a source to run on an existing Pulsar cluster, you alternatively can run a source as a process on your local machine:
+
+```bash
+bin/pulsar-admin source localrun --className  org.apache.pulsar.io.twitter.TwitterFireHose --jar ~/application.jar --tenant test --namespace ns1 --name twitter-source --destinationTopicName twitter_data
+```
+
+If you are submitting a built-in source, you don't need to specify `--className` and `--jar`.
+You can simply specify the source type `--source-type`. The command to submit a built-in source is
+in following form:
+
+```bash
+./bin/pulsar-admin source create \
+    --tenant <tenant> \
+    --namespace <namespace> \
+    --name <source-name> \
+    --destinationTopicName <input-topics> \
+    --source-type <source-type>
+```
+
+Here's an example to submit a Kafka source:
+
+```bash
+./bin/pulsar-admin source create \
+    --tenant test-tenant \
+    --namespace test-namespace \
+    --name test-kafka-source \
+    --destinationTopicName pulsar_sink_topic \
+    --source-type kafka
+```
+
+### Running Sinks
+
+You can submit a sink to be run in an existing Pulsar cluster using a command of this form:
+
+```bash
+./bin/pulsar-admin sink create --className  <classname> --jar <jar-location> --tenant test --namespace <namespace> --name <sink-name> --inputs <input-topics>
+```
+
+Here’s an example command:
+
+```bash
+./bin/pulsar-admin sink create --className  org.apache.pulsar.io.cassandra --jar ~/application.jar --tenant test --namespace ns1 --name cassandra-sink --inputs test_topic
+```
+
+Instead of submitting a sink to run on an existing Pulsar cluster, you alternatively can run a sink as a process on your local machine:
+
+```bash
+./bin/pulsar-admin sink localrun --className  org.apache.pulsar.io.cassandra --jar ~/application.jar --tenant test --namespace ns1 --name cassandra-sink --inputs test_topic
+```
+
+If you are submitting a built-in sink, you don't need to specify `--className` and `--jar`.
+You can simply specify the sink type `--sink-type`. The command to submit a built-in sink is
+in following form:
+
+```bash
+./bin/pulsar-admin sink create \
+    --tenant <tenant> \
+    --namespace <namespace> \
+    --name <sink-name> \
+    --inputs <input-topics> \
+    --sink-type <sink-type>
+```
+
+Here's an example to submit a Cassandra sink:
+
+```bash
+./bin/pulsar-admin sink create \
+    --tenant test-tenant \
+    --namespace test-namespace \
+    --name test-cassandra-sink \
+    --inputs pulsar_input_topic \
+    --sink-type cassandra
+```
+
+## Monitoring Connectors
+
+Since Pulsar IO connectors are running as [Pulsar Functions](functions-overiew.md), so you can use [`functions`](reference-pulsar-admin.md#source) commands
+available in the [`pulsar-admin`](reference-pulsar-admin.md) CLI tool.
+
+### Retrieve Connector Metadata
+
+```
+bin/pulsar-admin functions get \
+    --tenant <tenant> \
+    --namespace <namespace> \
+    --name <connector-name>
+```
+
+### Retrieve Connector Running Status
+
+```
+bin/pulsar-admin functions getstatus \
+    --tenant <tenant> \
+    --namespace <namespace> \
+    --name <connector-name>
+```
diff --git a/site2/docs/io-overview.md b/site2/docs/io-overview.md
index 56a36f25ac..1568caf16e 100644
--- a/site2/docs/io-overview.md
+++ b/site2/docs/io-overview.md
@@ -7,7 +7,7 @@ sidebar_label: Overview
 Messaging systems are most powerful when you can easily use them in conjunction with external systems like databases and other messaging systems. **Pulsar IO** is a feature of Pulsar that enables you to easily create, deploy, and manage Pulsar **connectors** that interact with external systems, such as [Apache Cassandra](https://cassandra.apache.org), [Aerospike](https://www.aerospike.com), and many others.
 
 > #### Pulsar IO and Pulsar Functions
-> Under the hood, Pulsar IO connectors are specialized [Pulsar Functions](functions-overview.md) purpose-built to interface with external systems. The [administrative interface](io-quickstart.md) for Pulsar IO is, in fact, quite similar to that of Pulsar Functions."
+> Under the hood, Pulsar IO connectors are specialized [Pulsar Functions](functions-overview.md) purpose-built to interface with external systems. The [administrative interface](io-quickstart.md) for Pulsar IO is, in fact, quite similar to that of Pulsar Functions.
 
 ## Sources and sinks
 
@@ -28,11 +28,12 @@ Pulsar IO connectors can be managed via the [`pulsar-admin`](reference-pulsar-ad
 
 The following connectors are currently available for Pulsar:
 
-|Name|Java Class|
-|---|---|
-|[Aerospike sink](https://www.aerospike.com/)|[`org.apache.pulsar.io.aerospike.AerospikeSink`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/aerospike/src/main/java/org/apache/pulsar/io/aerospike/AerospikeStringSink.java)|
-|[Cassandra sink](https://cassandra.apache.org)|[`org.apache.pulsar.io.cassandra.CassandraSink`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/cassandra/src/main/java/org/apache/pulsar/io/cassandra/CassandraStringSink.java)|
-|[Kafka source](https://kafka.apache.org)|[`org.apache.pulsar.io.kafka.KafkaSource`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/kafka/src/main/java/org/apache/pulsar/io/kafka/KafkaStringSource.java)|
-|[Kafka sink](https://kafka.apache.org)|[`org.apache.pulsar.io.kafka.KafkaSink`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/kafka/src/main/java/org/apache/pulsar/io/kafka/KafkaStringSink.java)|
-|[RabbitMQ source](https://www.rabbitmq.com)|[`org.apache.pulsar.io.rabbitmq.RabbitMQSource`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/rabbitmq/src/main/java/org/apache/pulsar/io/rabbitmq/RabbitMQSource.java)|
-|[Twitter Firehose source](https://developer.twitter.com/en/docs)|[org.apache.pulsar.io.twitter.TwitterFireHose](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/twitter/src/main/java/org/apache/pulsar/io/twitter/TwitterFireHose.java)|
+|Name|Java Class|Documentation|
+|---|---|---|
+|[Aerospike sink](https://www.aerospike.com/)|[`org.apache.pulsar.io.aerospike.AerospikeSink`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/aerospike/src/main/java/org/apache/pulsar/io/aerospike/AerospikeStringSink.java)|[Documentation](io-aerospike.md)|
+|[Cassandra sink](https://cassandra.apache.org)|[`org.apache.pulsar.io.cassandra.CassandraSink`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/cassandra/src/main/java/org/apache/pulsar/io/cassandra/CassandraStringSink.java)|[Documentation](io-cassandra.md)|
+|[Kafka source](https://kafka.apache.org)|[`org.apache.pulsar.io.kafka.KafkaSource`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/kafka/src/main/java/org/apache/pulsar/io/kafka/KafkaStringSource.java)|[Documentation](io-kafka.md#source)|
+|[Kafka sink](https://kafka.apache.org)|[`org.apache.pulsar.io.kafka.KafkaSink`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/kafka/src/main/java/org/apache/pulsar/io/kafka/KafkaStringSink.java)|[Documentation](io-kafka.md#sink)|
+|[Kinesis sink](https://aws.amazon.com/kinesis/)|[`org.apache.pulsar.io.kinesis.KinesisSink`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/kinesis/src/main/java/org/apache/pulsar/io/kinesis/KinesisSink.java)|[Documentation](io-kinesis.md#sink)|
+|[RabbitMQ source](https://www.rabbitmq.com)|[`org.apache.pulsar.io.rabbitmq.RabbitMQSource`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/rabbitmq/src/main/java/org/apache/pulsar/io/rabbitmq/RabbitMQSource.java)|[Documentation](io-rabbitmq.md#sink)|
+|[Twitter Firehose source](https://developer.twitter.com/en/docs)|[org.apache.pulsar.io.twitter.TwitterFireHose](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/twitter/src/main/java/org/apache/pulsar/io/twitter/TwitterFireHose.java)|[Documentation](io-twitter.md#source)|
diff --git a/site2/docs/io-quickstart.md b/site2/docs/io-quickstart.md
index 2140f67110..c84a701d18 100644
--- a/site2/docs/io-quickstart.md
+++ b/site2/docs/io-quickstart.md
@@ -1,72 +1,399 @@
 ---
 id: io-quickstart
-title: Pulsar IO Overview
+title: Tutorial: Connecting Pulsar with Apache Cassandra
 sidebar_label: Getting started
 ---
 
-Pulsar IO is a feature of Pulsar that enables you to easily create and manage **connectors** that interface with external systems, such as databases and other messaging systems.
+This tutorial provides a hands-on look at how you can move data out of Pulsar without writing a single line of code.
+It is helpful to review the [concepts](io-overview.md) for Pulsar I/O in tandem with running the steps in this guide
+to gain a deeper understanding. At the end of this tutorial, you will be able to:
 
-## Setup
+- Connect your Pulsar cluster with your Cassandra cluster
 
-In order to run Pulsar IO connectors, you'll need to have a binary distribution of pulsar locally.
+> #### Tip
+>
+> 1. These instructions assume you are running Pulsar in [standalone mode](getting-started-standalone.md). However all
+> the commands used in this tutorial should be able to be used in a multi-nodes Pulsar cluster without any changes.
+>
+> 2. All the instructions are assumed to run at the root directory of a Pulsar binary distribution.
 
-## Managing connectors
+## Installing Pulsar
 
-Pulsar connectors can be managed using the [`source`](reference-pulsar-admin.md#source) and [`sink`](reference-pulsar-admin.md#sink) commands of the [`pulsar-admin`](reference-pulsar-admin.md) CLI tool.
+To get started running Pulsar, download a binary tarball release in one of the following ways:
 
-### Running sources
+* by clicking the link below and downloading the release from an Apache mirror:
 
-You can use the [`create`](reference-pulsar-admin.md#source-create)
+  * <a href="pulsar:binary_release_url" download>Pulsar {{pulsar:version}} binary release</a>
 
-You can submit a sink to be run in an existing Pulsar cluster using a command of this form:
+* from the Pulsar [downloads page](pulsar:download_page_url)
+* from the Pulsar [releases page](https://github.com/apache/incubator-pulsar/releases/latest)
+* using [wget](https://www.gnu.org/software/wget):
+
+  ```shell
+  $ wget pulsar:binary_release_url
+  ```
+
+Once the tarball is downloaded, untar it and `cd` into the resulting directory:
+
+```bash
+$ tar xvfz apache-pulsar-{{pulsar:version}}-bin.tar.gz
+$ cd apache-pulsar-{{pulsar:version}}
+```
+
+## Installing Builtin Connectors
+
+Since release `2.1.0-incubating`, Pulsar releases a separate binary distribution, containing all the `builtin` connectors.
+If you would like to enable those `builtin` connectors, you can download the connectors tarball release in one of the following ways:
+
+* by clicking the link below and downloading the release from an Apache mirror:
+
+  * <a href="pulsar:connector_release_url" download>Pulsar IO Connectors {{pulsar:version}} release</a>
+
+* from the Pulsar [downloads page](pulsar:download_page_url)
+* from the Pulsar [releases page](https://github.com/apache/incubator-pulsar/releases/latest)
+* using [wget](https://www.gnu.org/software/wget):
+
+  ```shell
+  $ wget pulsar:connector_release_url
+  ```
+
+Once the tarball is downloaded, in the pulsar directory, untar the io-connectors package and copy the connectors as `connectors`
+in the pulsar directory:
+
+```bash
+$ tar xvfz /path/to/apache-pulsar-io-connectors-{{pulsar:version}}-bin.tar.gz
+
+// you will find a directory named `apache-pulsar-io-connectors-{{pulsar:version}}` in the pulsar directory
+// then copy the connectors
+
+$ cd apache-pulsar-io-connectors-{{pulsar:version}}/connectors connectors
+
+$ ls connectors
+pulsar-io-aerospike-{{pulsar.version}}.nar
+pulsar-io-cassandra-{{pulsar.version}}.nar 
+pulsar-io-kafka-{{pulsar.version}}.nar     
+pulsar-io-kinesis-{{pulsar.version}}.nar   
+pulsar-io-rabbitmq-{{pulsar.version}}.nar  
+pulsar-io-twitter-{{pulsar.version}}.nar
+...
+```
+
+
+## Start Pulsar Service
+
+```bash
+bin/pulsar standalone
+```
+
+All the components of a Pulsar service will start in order. You can curl those pulsar service endpoints to make sure Pulsar service is up running correctly.
+
+1. Check pulsar binary protocol port.
+
+```bash
+telnet localhost 6650
+```
+
+2. Check pulsar function cluster
+
+```bash
+curl -s http://localhost:8080/admin/v2/functions/cluster
+```
+
+Example output:
+```shell
+[{"workerId":"c-standalone-fw-localhost-6750","workerHostname":"localhost","port":6750}]
+```
+
+3. Make sure public tenant and default namespace exist
+
+```bash
+curl -s http://localhost:8080/admin/v2/namespaces/public
+```
+
+Example outoupt:
+```shell
+["public/default","public/functions"]
+```
+
+4. All builtin connectors should be listed as available.
+
+```bash
+curl -s http://localhost:8080/admin/v2/functions/connectors
+```
+
+Example output:
+```json
+[{"name":"aerospike","description":"Aerospike database sink","sinkClass":"org.apache.pulsar.io.aerospike.AerospikeStringSink"},{"name":"cassandra","description":"Writes data into Cassandra","sinkClass":"org.apache.pulsar.io.cassandra.CassandraStringSink"},{"name":"kafka","description":"Kafka source and sink connector","sourceClass":"org.apache.pulsar.io.kafka.KafkaStringSource","sinkClass":"org.apache.pulsar.io.kafka.KafkaStringSink"},{"name":"kinesis","description":"Kinesis sink connector","sinkClass":"org.apache.pulsar.io.kinesis.KinesisSink"},{"name":"rabbitmq","description":"RabbitMQ source connector","sourceClass":"org.apache.pulsar.io.rabbitmq.RabbitMQSource"},{"name":"twitter","description":"Ingest data from Twitter firehose","sourceClass":"org.apache.pulsar.io.twitter.TwitterFireHose"}]
+```
+
+If an error occurred while starting Pulsar service, you may be able to seen exception at the terminal you are running `pulsar/standalone`,
+or you can navigate the `logs` directory under the Pulsar directory to view the logs.
+
+## Connect Pulsar to Apache Cassandra
+
+> Make sure you have docker available at your laptop. If you don't have docker installed, you can follow the [instructions](https://docs.docker.com/docker-for-mac/install/).
+
+We are using `cassandra` docker image to start a single-node cassandra cluster in Docker.
+
+### Setup the Cassandra Cluster
+
+#### Start a Cassandra Cluster
+
+```bash
+docker run -d --rm --name=cassandra -p 9042:9042 cassandra
+```
+
+Before moving to next steps, make sure the cassandra cluster is up running.
+
+1. Make sure the docker process is running.
 
 ```bash
-$ ./bin/pulsar-admin sink create --className  <classname> --jar <jar-location> --tenant test --namespace <namespace> --name <sink-name> --inputs <input-topics>
+docker ps
 ```
 
-Here’s an example command:
+2. Check the cassandra logs to make sure cassandra process is running as expected.
 
 ```bash
-bin/pulsar-admin source create --className  org.apache.pulsar.io.twitter.TwitterFireHose --jar ~/application.jar --tenant test --namespace ns1 --name twitter-source --destinationTopicName twitter_data
+docker logs cassandra
 ```
 
-Instead of submitting a source to run on an existing Pulsar cluster, you alternatively can run a source as a process on your local machine:
+3. Check the cluster status
 
 ```bash
-bin/pulsar-admin source localrun --className  org.apache.pulsar.io.twitter.TwitterFireHose --jar ~/application.jar --tenant test --namespace ns1 --name twitter-source --destinationTopicName twitter_data
+docker exec cassandra nodetool status
 ```
 
-### Running Sinks
+Example output:
+```
+Datacenter: datacenter1
+=======================
+Status=Up/Down
+|/ State=Normal/Leaving/Joining/Moving
+--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
+UN  172.17.0.2  103.67 KiB  256          100.0%            af0e4b2f-84e0-4f0b-bb14-bd5f9070ff26  rack1
+```
 
-You can submit a sink to be run in an existing Pulsar cluster using a command of this form:
+#### Create keyspace and table
+
+We are using `cqlsh` to connect to the cassandra cluster to create keyspace and table.
 
 ```bash
-./bin/pulsar-admin sink create --className  <classname> --jar <jar-location> --tenant test --namespace <namespace> --name <sink-name> --inputs <input-topics>
+$ docker exec -ti cassandra cqlsh localhost
+Connected to Test Cluster at localhost:9042.
+[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
+Use HELP for help.
+cqlsh>
 ```
 
-Here’s an example command:
+All the following commands are executed in `cqlsh`.
+
+##### Create keyspace `pulsar_test_keyspace`
 
 ```bash
-./bin/pulsar-admin sink create --className  org.apache.pulsar.io.cassandra --jar ~/application.jar --tenant test --namespace ns1 --name cassandra-sink --inputs test_topic
+cqlsh> CREATE KEYSPACE pulsar_test_keyspace WITH replication = {'class':'SimpleStrategy', 'replication_factor':1};
 ```
 
-Instead of submitting a sink to run on an existing Pulsar cluster, you alternatively can run a sink as a process on your local machine:
+#### Create table `pulsar_test_table`
 
 ```bash
-./bin/pulsar-admin sink localrun --className  org.apache.pulsar.io.cassandra --jar ~/application.jar --tenant test --namespace ns1 --name cassandra-sink --inputs test_topic
+cqlsh> USE pulsar_test_keyspace;
+cqlsh:pulsar_test_keyspace> CREATE TABLE pulsar_test_table (key text PRIMARY KEY, col text);
+```
+
+### Configure a Cassandra Sink
+
+Now that we have a Cassandra cluster running locally. In this section, we will configure a Cassandra sink connector.
+The Cassandra sink connector will read messages from a Pulsar topic and write the messages into a Cassandra table.
+
+In order to run a Cassandra sink connector, you need to prepare a yaml config file including informations that Pulsar IO
+runtime needs to know. For example, how Pulsar IO can find the cassandra cluster, what is the keyspace and table that
+Pulsar IO will be using for writing Pulsar messages to.
+
+Create a file `examples/cassandra-sink.yml` and edit it to fill in following content:
+
+```
+configs:
+    roots: "localhost:9042"
+    keyspace: "pulsar_test_keyspace"
+    columnFamily: "pulsar_test_table"
+    keyname: "key"
+    columnName: "col"
+```
+
+To learn more about Cassandra Connector, see [Cassandra Connector](io-cassandra.md).
+
+### Submit a Cassandra Sink
+
+Pulsar provides the [CLI](reference-cli-tools.md) for running and managing Pulsar I/O connectors.
+
+We can run following command to sink a sink connector with type `cassandra` and config file `examples/cassandra-sink.yml`.
+
+```shell
+bin/pulsar-admin sink create \
+    --tenant public \
+    --namespace default \
+    --name cassandra-test-sink \
+    --sink-type cassandra \
+    --sinkConfigFile examples/cassandra-sink.yml \
+    --inputs test_cassandra
 ```
 
-## Available connectors
+Once the command is executed, Pulsar will create a sink connector named `cassandra-test-sink` and the sink connector will be running
+as a Pulsar Function and write the messages produced in topic `test_cassandra` to Cassandra table `pulsar_test_table`.
+
+### Inspect the Cassandra Sink
+
+Since an IO connector is running as [Pulsar Functions](functions-overview.md), you can use [functions CLI](reference-pulsar-admin.md#functions)
+for inspecting and managing the IO connectors.
 
-At the moment, the following connectors are available for Pulsar:
+#### Retrieve Sink Info
 
-|Name|Java Class|
-|---|---|
-|[Aerospike sink](https://www.aerospike.com/)|[`org.apache.pulsar.io.aerospike.AerospikeSink`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/aerospike/src/main/java/org/apache/pulsar/io/aerospike/AerospikeStringSink.java)|
-|[Cassandra sink](https://cassandra.apache.org)|[`org.apache.pulsar.io.cassandra.CassandraSink`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/cassandra/src/main/java/org/apache/pulsar/io/cassandra/CassandraStringSink.java)|
-|[Kafka source](https://kafka.apache.org)|[`org.apache.pulsar.io.kafka.KafkaSource`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/kafka/src/main/java/org/apache/pulsar/io/kafka/KafkaStringSource.java)|
-|[Kafka sink](https://kafka.apache.org)|[`org.apache.pulsar.io.kafka.KafkaSink`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/kafka/src/main/java/org/apache/pulsar/io/kafka/KafkaStringSink.java)|
-|[RabbitMQ source](https://www.rabbitmq.com)|[`org.apache.pulsar.io.rabbitmq.RabbitMQSource`](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/rabbitmq/src/main/java/org/apache/pulsar/io/rabbitmq/RabbitMQSource.java)|
-|[Twitter Firehose source](https://developer.twitter.com/en/docs)|[org.apache.pulsar.io.twitter.TwitterFireHose](https://github.com/apache/incubator-pulsar/blob/master/pulsar-io/twitter/src/main/java/org/apache/pulsar/io/twitter/TwitterFireHose.java)|
+```bash
+bin/pulsar-admin functions get \
+    --tenant public \
+    --namespace default \
+    --name cassandra-test-sink
+```
 
+Example output:
+
+```shell
+{
+  "tenant": "public",
+  "namespace": "default",
+  "name": "cassandra-test-sink",
+  "className": "org.apache.pulsar.functions.api.utils.IdentityFunction",
+  "autoAck": true,
+  "parallelism": 1,
+  "source": {
+    "topicsToSerDeClassName": {
+      "test_cassandra": ""
+    }
+  },
+  "sink": {
+    "configs": "{\"roots\":\"cassandra\",\"keyspace\":\"pulsar_test_keyspace\",\"columnFamily\":\"pulsar_test_table\",\"keyname\":\"key\",\"columnName\":\"col\"}",
+    "builtin": "cassandra"
+  },
+  "resources": {}
+}
+```
 
+#### Check Sink Running Status
+
+```bash
+bin/pulsar-admin functions getstatus \
+    --tenant public \
+    --namespace default \
+    --name cassandra-test-sink
+```
+
+Example output:
+
+```shell
+{
+  "functionStatusList": [
+    {
+      "running": true,
+      "instanceId": "0",
+      "metrics": {
+        "metrics": {
+          "__total_processed__": {},
+          "__total_successfully_processed__": {},
+          "__total_system_exceptions__": {},
+          "__total_user_exceptions__": {},
+          "__total_serialization_exceptions__": {},
+          "__avg_latency_ms__": {}
+        }
+      },
+      "workerId": "c-standalone-fw-localhost-6750"
+    }
+  ]
+}
+```
+
+### Verify the Cassandra Sink
+
+Now lets produce some messages to the input topic of the Cassandra sink `test_cassandra`.
+
+```bash
+for i in {0..9}; do bin/pulsar-client produce -m "key-$i" -n 1 test_cassandra; done
+```
+
+Inspect the sink running status again. You should be able to see 10 messages are processed by the Cassandra sink.
+
+```bash
+bin/pulsar-admin functions getstatus \
+    --tenant public \
+    --namespace default \
+    --name cassandra-test-sink
+```
+
+Example output:
+
+```shell
+{
+  "functionStatusList": [
+    {
+      "running": true,
+      "numProcessed": "11",
+      "numSuccessfullyProcessed": "11",
+      "lastInvocationTime": "1532031040117",
+      "instanceId": "0",
+      "metrics": {
+        "metrics": {
+          "__total_processed__": {
+            "count": 5.0,
+            "sum": 5.0,
+            "max": 5.0
+          },
+          "__total_successfully_processed__": {
+            "count": 5.0,
+            "sum": 5.0,
+            "max": 5.0
+          },
+          "__total_system_exceptions__": {},
+          "__total_user_exceptions__": {},
+          "__total_serialization_exceptions__": {},
+          "__avg_latency_ms__": {}
+        }
+      },
+      "workerId": "c-standalone-fw-localhost-6750"
+    }
+  ]
+}
+```
+
+Finally, lets inspect the results in Cassandra using `cqlsh`
+
+```bash
+docker exec -ti cassandra cqlsh localhost
+```
+
+Select the rows from the Cassandra table `pulsar_test_table`:
+
+```bash
+cqlsh> use pulsar_test_keyspace;
+cqlsh:pulsar_test_keyspace> select * from pulsar_test_table;
+
+ key    | col
+--------+--------
+  key-5 |  key-5
+  key-0 |  key-0
+  key-9 |  key-9
+  key-2 |  key-2
+  key-1 |  key-1
+  key-3 |  key-3
+  key-6 |  key-6
+  key-7 |  key-7
+  key-4 |  key-4
+  key-8 |  key-8
+```
+
+### Delete the Cassandra Sink
+
+```shell
+bin/pulsar-admin sink delete \
+    --tenant public \
+    --namespace default \
+    --name cassandra-test-sink
+```
diff --git a/site2/docs/io-rabbitmq.md b/site2/docs/io-rabbitmq.md
new file mode 100644
index 0000000000..d2cea153e7
--- /dev/null
+++ b/site2/docs/io-rabbitmq.md
@@ -0,0 +1,19 @@
+---
+id: io-rabbitmq
+title: RabbitMQ Connector
+sidebar_label: RabittMQ Connector
+---
+
+## Source
+
+The RabittMQ Source connector is used for receiving messages from a RabittMQ cluster and writing
+messages to Pulsar topics.
+
+### Source Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| `connectionName` | `true` | `null` | A new broker connection name. |
+| `amqUri` | `true` | `null` | An AMQP URI: host, port, username, password and virtual host. |
+| `queueName` | `true` | `null` | RabbitMQ queue name. |
+
diff --git a/site2/docs/io-twitter.md b/site2/docs/io-twitter.md
new file mode 100644
index 0000000000..5dabe14720
--- /dev/null
+++ b/site2/docs/io-twitter.md
@@ -0,0 +1,24 @@
+---
+id: io-twitter
+title: Twitter Firehose Connector
+sidebar_label: Twitter Firehose Connector
+---
+
+The Twitter Firehose connector is used for receiving tweets from Twitter Firehose and writing
+the tweets to Pulsar topics.
+
+## Source Configuration Options
+
+You can get the OAuth credentials from [Twitter Developers Portal](https://developer.twitter.com/en.html).
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| consumerKey | true | null | Twitter OAuth Consumer Key |
+| consumerSecret | true | null | Twitter OAuth Consumer Secret |
+| token | true | null | Twitter OAuth Token |
+| tokenSecret | true | null | Twitter OAuth Secret |
+| clientName | false | `openconnector-twitter-source"`| Client name |
+| clientHosts | false | `https://stream.twitter.com` | Twitter Firehose hosts that client connects to |
+| clientBufferSize | false | `50000` | The buffer size for buffering tweets fetched from Twitter Firehose |
+
+
diff --git a/site2/website/scripts/replace.js b/site2/website/scripts/replace.js
index cccc2152a8..2c83b03123 100644
--- a/site2/website/scripts/replace.js
+++ b/site2/website/scripts/replace.js
@@ -25,6 +25,10 @@ function binaryReleaseUrl(version) {
   return `http://www.apache.org/dyn/closer.cgi/incubator/pulsar/pulsar-${version}/apache-pulsar-${version}-bin.tar.gz`
 }
 
+function connectorReleaseUrl(version) {
+  return `http://www.apache.org/dyn/closer.cgi/incubator/pulsar/pulsar-${version}/apache-pulsar-io-connectors-${version}-bin.tar.gz`
+}
+
 function rpmReleaseUrl(version, type) {
   rpmVersion = version.replace('incubating', '1_incubating');
   return `https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=incubator/pulsar/pulsar-${version}/RPMS/apache-pulsar-client${type}-${rpmVersion}.x86_64.rpm`
@@ -56,6 +60,7 @@ const from = [
   /{{pulsar:version_number}}/g,
   /{{pulsar:version}}/g, 
   /pulsar:binary_release_url/g,
+  /pulsar:connector_release_url/g,
   /pulsar:download_page_url/g,
   /{{pulsar:rpm:client}}/g,
   /{{pulsar:rpm:client-debuginfo}}/g,
@@ -75,6 +80,7 @@ const options = {
     `${latestVersion}`, 
     `${latestVersion}-incubating`, 
     binaryReleaseUrl(`${latestVersion}-incubating`), 
+    connectorReleaseUrl(`${latestVersion}-incubating`), 
     downloadPageUrl(),
     rpmReleaseUrl(`${latestVersion}-incubating`, ""),
     rpmReleaseUrl(`${latestVersion}-incubating`, "-debuginfo"),
@@ -103,6 +109,7 @@ for (v of versions) {
       `${v}`, 
       `${v}-incubating`, 
       binaryReleaseUrl(`${v}-incubating`),
+      connectorReleaseUrl(`${v}-incubating`),
       downloadPageUrl(),
       rpmReleaseUrl(`${v}-incubating`, ""),
       rpmReleaseUrl(`${v}-incubating`, "-debuginfo"),
diff --git a/site2/website/sidebars.json b/site2/website/sidebars.json
index 48a50d43da..a5303fac22 100644
--- a/site2/website/sidebars.json
+++ b/site2/website/sidebars.json
@@ -28,7 +28,10 @@
     ],
     "Pulsar IO": [
       "io-overview",
-      "io-quickstart"
+      "io-quickstart",
+      "io-managing",
+      "io-connectors",
+      "io-develop"
     ],
     "Deployment": [
       "deploy-aws",


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message