pulsar-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From si...@apache.org
Subject [pulsar] branch master updated: Add several instructions for IO Connectors. (#3739)
Date Thu, 07 Mar 2019 13:47:50 GMT
This is an automated email from the ASF dual-hosted git repository.

sijie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new 3d9b47e  Add several instructions for IO Connectors. (#3739)
3d9b47e is described below

commit 3d9b47e3b7e1dc741183c6588ef7b886950b9523
Author: Fangbin Sun <sunfangbin@gmail.com>
AuthorDate: Thu Mar 7 21:47:45 2019 +0800

    Add several instructions for IO Connectors. (#3739)
---
 site2/docs/io-connectors.md    |  4 ++++
 site2/docs/io-elasticsearch.md | 21 +++++++++++++++++++++
 site2/docs/io-file.md          | 27 +++++++++++++++++++++++++++
 site2/docs/io-hdfs.md          | 26 ++++++++++++++++++++++++++
 site2/docs/io-mongo.md         | 20 ++++++++++++++++++++
 5 files changed, 98 insertions(+)

diff --git a/site2/docs/io-connectors.md b/site2/docs/io-connectors.md
index 909e554..2f166ac 100644
--- a/site2/docs/io-connectors.md
+++ b/site2/docs/io-connectors.md
@@ -19,3 +19,7 @@ Pulsar Functions cluster.
 - [CDC Source Connector based on Debezium](io-cdc.md)
 - [Netty Source Connector](io-netty.md#source)
 - [Hbase Sink Connector](io-hbase.md#sink)
+- [ElasticSearch Sink Connector](io-elasticsearch.md#sink)
+- [File Source Connector](io-file.md#source)
+- [Hdfs Sink Connector](io-hdfs.md#sink)
+- [MongoDB Sink Connector](io-mongo.md#sink)
diff --git a/site2/docs/io-elasticsearch.md b/site2/docs/io-elasticsearch.md
new file mode 100644
index 0000000..18aacdf
--- /dev/null
+++ b/site2/docs/io-elasticsearch.md
@@ -0,0 +1,21 @@
+---
+id: io-elasticsearch
+title: ElasticSearch Connector
+sidebar_label: ElasticSearch Connector
+---
+
+## Sink
+
+The ElasticSearch Sink Connector is used to pull messages from Pulsar topics and persist
the messages
+to a index.
+
+## Sink Configuration Options
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `elasticSearchUrl` | `null` | `true` | The url of elastic search cluster that the connector
connects to. |
+| `indexName` | `null` | `true` | The index name that the connector writes messages to. |
+| `indexNumberOfShards` | `1` | `false` | The number of shards of the index. |
+| `indexNumberOfReplicas` | `1` | `false` | The number of replicas of the index. |
+| `username` | `null` | `false` | The username used by the connector to connect to the elastic
search cluster. If username is set, a password should also be provided. |
+| `password` | `null` | `false` | The password used by the connector to connect to the elastic
search cluster. If password is set, a username should also be provided. |
\ No newline at end of file
diff --git a/site2/docs/io-file.md b/site2/docs/io-file.md
new file mode 100644
index 0000000..7d65cc1
--- /dev/null
+++ b/site2/docs/io-file.md
@@ -0,0 +1,27 @@
+---
+id: io-file
+title: File Connector
+sidebar_label: File Connector
+---
+
+## Source
+
+The File Source Connector is used to pull messages from files in a directory and persist
the messages
+to a Pulsar topic.
+
+### Source Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| inputDirectory | `true` | `null` | The input directory from which to pull files. |
+| recurse | `false` | `true` | Indicates whether or not to pull files from sub-directories.
|
+| keepFile | `false` | `false` | If true, the file is not deleted after it has been processed
and causes the file to be picked up continually. |
+| fileFilter | `false` | `[^\\.].*` | Only files whose names match the given regular expression
will be picked up. |
+| pathFilter | `false` | `null` | When 'recurse' property is true, then only sub-directories
whose path matches the given regular expression will be scanned. |
+| minimumFileAge | `false` | `0` | The minimum age that a file must be in order to be processed;
any file younger than this amount of time (according to last modification date) will be ignored.
|
+| maximumFileAge | `false` | `Long.MAX_VALUE` | The maximum age that a file must be in order
to be processed; any file older than this amount of time (according to last modification date)
will be ignored. |
+| minimumSize | `false` | `1` | The minimum size (in bytes) that a file must be in order
to be processed. |
+| maximumSize | `false` | `Double.MAX_VALUE` | The maximum size (in bytes) that a file can
be in order to be processed. |
+| ignoreHiddenFiles | `false` | `true` | Indicates whether or not hidden files should be
ignored or not. |
+| pollingInterval | `false` | `10000` | Indicates how long to wait before performing a directory
listing. |
+| numWorkers | `false` | `1` | The number of worker threads that will be processing the files.
This allows you to process a larger number of files concurrently. However, setting this to
a value greater than 1 will result in the data from multiple files being "intermingled" in
the target topic. |
\ No newline at end of file
diff --git a/site2/docs/io-hdfs.md b/site2/docs/io-hdfs.md
new file mode 100644
index 0000000..9c38923
--- /dev/null
+++ b/site2/docs/io-hdfs.md
@@ -0,0 +1,26 @@
+---
+id: io-hdfs
+title: Hdfs Connector
+sidebar_label: Hdfs Connector
+---
+
+## Sink
+
+The Hdfs Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to a hdfs file.
+
+## Sink Configuration Options
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `hdfsConfigResources` | `null` | `true` | A file or comma separated list of files which
contains the Hadoop file system configuration, e.g. 'core-site.xml', 'hdfs-site.xml'. |
+| `directory` | `null` | `true` | The HDFS directory from which files should be read from
or written to. |
+| `encoding` | `null` | `false` | The character encoding for the files, e.g. UTF-8, ASCII,
etc. |
+| `compression` | `null` | `false` | The compression codec used to compress/de-compress the
files on HDFS. |
+| `kerberosUserPrincipal` | `null` | `false` | The Kerberos user principal account to use
for authentication. |
+| `keytab` | `null` | `false` | The full pathname to the Kerberos keytab file to use for
authentication. |
+| `filenamePrefix` | `null` | `false` | The prefix of the files to create inside the HDFS
directory, i.e. a value of "topicA" will result in files named topicA-, topicA-, etc being
produced. |
+| `fileExtension` | `null` | `false` | The extension to add to the files written to HDFS,
e.g. '.txt', '.seq', etc. |
+| `separator` | `null` | `false` | The character to use to separate records in a text file.
If no value is provided then the content from all of the records will be concatenated together
in one continuous byte array. |
+| `syncInterval` | `null` | `false` | The interval (in milliseconds) between calls to flush
data to HDFS disk. |
+| `maxPendingRecords` | `Integer.MAX_VALUE` | `false` | The maximum number of records that
we hold in memory before acking. Default is `Integer.MAX_VALUE`. Setting this value to one,
results in every record being sent to disk before the record is acked, while setting it to
a higher values allows us to buffer records before flushing them all to disk. |
\ No newline at end of file
diff --git a/site2/docs/io-mongo.md b/site2/docs/io-mongo.md
new file mode 100644
index 0000000..cc8ea98
--- /dev/null
+++ b/site2/docs/io-mongo.md
@@ -0,0 +1,20 @@
+---
+id: io-mongo
+title: MongoDB Connector
+sidebar_label: MongoDB Connector
+---
+
+## Sink
+
+The MongoDB Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to a collection.
+
+## Sink Configuration Options
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `mongoUri` | `null` | `true` | The uri of mongodb that the connector connects to (see:
https://docs.mongodb.com/manual/reference/connection-string/). |
+| `database` | `null` | `true` | The name of the database to which the collection belongs
to. |
+| `collection` | `null` | `true` | The collection name that the connector writes messages
to. |
+| `batchSize` | `100` | `false` | The batch size of write to the collection. |
+| `batchTimeMs` | `1000` | `false` | The batch operation interval in milliseconds. |
\ No newline at end of file


Mime
View raw message