From commits-return-24108-archive-asf-public=cust-asf.ponee.io@pulsar.apache.org Thu Mar 7 13:47:53 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 89FA2180654 for ; Thu, 7 Mar 2019 14:47:52 +0100 (CET) Received: (qmail 60206 invoked by uid 500); 7 Mar 2019 13:47:51 -0000 Mailing-List: contact commits-help@pulsar.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pulsar.apache.org Delivered-To: mailing list commits@pulsar.apache.org Received: (qmail 60197 invoked by uid 99); 7 Mar 2019 13:47:51 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Mar 2019 13:47:51 +0000 Received: by gitbox.apache.org (ASF Mail Server at gitbox.apache.org, from userid 33) id 14FA08790E; Thu, 7 Mar 2019 13:47:51 +0000 (UTC) Date: Thu, 07 Mar 2019 13:47:50 +0000 To: "commits@pulsar.apache.org" Subject: [pulsar] branch master updated: Add several instructions for IO Connectors. (#3739) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Message-ID: <155196647084.6789.7165489617543328312@gitbox.apache.org> From: sijie@apache.org X-Git-Host: gitbox.apache.org X-Git-Repo: pulsar X-Git-Refname: refs/heads/master X-Git-Reftype: branch X-Git-Oldrev: 7a9f7f338ef7ca2d77a6fe5951afd1eacbd04207 X-Git-Newrev: 3d9b47e3b7e1dc741183c6588ef7b886950b9523 X-Git-Rev: 3d9b47e3b7e1dc741183c6588ef7b886950b9523 X-Git-NotificationType: ref_changed_plus_diff X-Git-Multimail-Version: 1.5.dev Auto-Submitted: auto-generated This is an automated email from the ASF dual-hosted git repository. sijie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/pulsar.git The following commit(s) were added to refs/heads/master by this push: new 3d9b47e Add several instructions for IO Connectors. (#3739) 3d9b47e is described below commit 3d9b47e3b7e1dc741183c6588ef7b886950b9523 Author: Fangbin Sun AuthorDate: Thu Mar 7 21:47:45 2019 +0800 Add several instructions for IO Connectors. (#3739) --- site2/docs/io-connectors.md | 4 ++++ site2/docs/io-elasticsearch.md | 21 +++++++++++++++++++++ site2/docs/io-file.md | 27 +++++++++++++++++++++++++++ site2/docs/io-hdfs.md | 26 ++++++++++++++++++++++++++ site2/docs/io-mongo.md | 20 ++++++++++++++++++++ 5 files changed, 98 insertions(+) diff --git a/site2/docs/io-connectors.md b/site2/docs/io-connectors.md index 909e554..2f166ac 100644 --- a/site2/docs/io-connectors.md +++ b/site2/docs/io-connectors.md @@ -19,3 +19,7 @@ Pulsar Functions cluster. - [CDC Source Connector based on Debezium](io-cdc.md) - [Netty Source Connector](io-netty.md#source) - [Hbase Sink Connector](io-hbase.md#sink) +- [ElasticSearch Sink Connector](io-elasticsearch.md#sink) +- [File Source Connector](io-file.md#source) +- [Hdfs Sink Connector](io-hdfs.md#sink) +- [MongoDB Sink Connector](io-mongo.md#sink) diff --git a/site2/docs/io-elasticsearch.md b/site2/docs/io-elasticsearch.md new file mode 100644 index 0000000..18aacdf --- /dev/null +++ b/site2/docs/io-elasticsearch.md @@ -0,0 +1,21 @@ +--- +id: io-elasticsearch +title: ElasticSearch Connector +sidebar_label: ElasticSearch Connector +--- + +## Sink + +The ElasticSearch Sink Connector is used to pull messages from Pulsar topics and persist the messages +to a index. + +## Sink Configuration Options + +| Name | Default | Required | Description | +|------|---------|----------|-------------| +| `elasticSearchUrl` | `null` | `true` | The url of elastic search cluster that the connector connects to. | +| `indexName` | `null` | `true` | The index name that the connector writes messages to. | +| `indexNumberOfShards` | `1` | `false` | The number of shards of the index. | +| `indexNumberOfReplicas` | `1` | `false` | The number of replicas of the index. | +| `username` | `null` | `false` | The username used by the connector to connect to the elastic search cluster. If username is set, a password should also be provided. | +| `password` | `null` | `false` | The password used by the connector to connect to the elastic search cluster. If password is set, a username should also be provided. | \ No newline at end of file diff --git a/site2/docs/io-file.md b/site2/docs/io-file.md new file mode 100644 index 0000000..7d65cc1 --- /dev/null +++ b/site2/docs/io-file.md @@ -0,0 +1,27 @@ +--- +id: io-file +title: File Connector +sidebar_label: File Connector +--- + +## Source + +The File Source Connector is used to pull messages from files in a directory and persist the messages +to a Pulsar topic. + +### Source Configuration Options + +| Name | Required | Default | Description | +|------|----------|---------|-------------| +| inputDirectory | `true` | `null` | The input directory from which to pull files. | +| recurse | `false` | `true` | Indicates whether or not to pull files from sub-directories. | +| keepFile | `false` | `false` | If true, the file is not deleted after it has been processed and causes the file to be picked up continually. | +| fileFilter | `false` | `[^\\.].*` | Only files whose names match the given regular expression will be picked up. | +| pathFilter | `false` | `null` | When 'recurse' property is true, then only sub-directories whose path matches the given regular expression will be scanned. | +| minimumFileAge | `false` | `0` | The minimum age that a file must be in order to be processed; any file younger than this amount of time (according to last modification date) will be ignored. | +| maximumFileAge | `false` | `Long.MAX_VALUE` | The maximum age that a file must be in order to be processed; any file older than this amount of time (according to last modification date) will be ignored. | +| minimumSize | `false` | `1` | The minimum size (in bytes) that a file must be in order to be processed. | +| maximumSize | `false` | `Double.MAX_VALUE` | The maximum size (in bytes) that a file can be in order to be processed. | +| ignoreHiddenFiles | `false` | `true` | Indicates whether or not hidden files should be ignored or not. | +| pollingInterval | `false` | `10000` | Indicates how long to wait before performing a directory listing. | +| numWorkers | `false` | `1` | The number of worker threads that will be processing the files. This allows you to process a larger number of files concurrently. However, setting this to a value greater than 1 will result in the data from multiple files being "intermingled" in the target topic. | \ No newline at end of file diff --git a/site2/docs/io-hdfs.md b/site2/docs/io-hdfs.md new file mode 100644 index 0000000..9c38923 --- /dev/null +++ b/site2/docs/io-hdfs.md @@ -0,0 +1,26 @@ +--- +id: io-hdfs +title: Hdfs Connector +sidebar_label: Hdfs Connector +--- + +## Sink + +The Hdfs Sink Connector is used to pull messages from Pulsar topics and persist the messages +to a hdfs file. + +## Sink Configuration Options + +| Name | Default | Required | Description | +|------|---------|----------|-------------| +| `hdfsConfigResources` | `null` | `true` | A file or comma separated list of files which contains the Hadoop file system configuration, e.g. 'core-site.xml', 'hdfs-site.xml'. | +| `directory` | `null` | `true` | The HDFS directory from which files should be read from or written to. | +| `encoding` | `null` | `false` | The character encoding for the files, e.g. UTF-8, ASCII, etc. | +| `compression` | `null` | `false` | The compression codec used to compress/de-compress the files on HDFS. | +| `kerberosUserPrincipal` | `null` | `false` | The Kerberos user principal account to use for authentication. | +| `keytab` | `null` | `false` | The full pathname to the Kerberos keytab file to use for authentication. | +| `filenamePrefix` | `null` | `false` | The prefix of the files to create inside the HDFS directory, i.e. a value of "topicA" will result in files named topicA-, topicA-, etc being produced. | +| `fileExtension` | `null` | `false` | The extension to add to the files written to HDFS, e.g. '.txt', '.seq', etc. | +| `separator` | `null` | `false` | The character to use to separate records in a text file. If no value is provided then the content from all of the records will be concatenated together in one continuous byte array. | +| `syncInterval` | `null` | `false` | The interval (in milliseconds) between calls to flush data to HDFS disk. | +| `maxPendingRecords` | `Integer.MAX_VALUE` | `false` | The maximum number of records that we hold in memory before acking. Default is `Integer.MAX_VALUE`. Setting this value to one, results in every record being sent to disk before the record is acked, while setting it to a higher values allows us to buffer records before flushing them all to disk. | \ No newline at end of file diff --git a/site2/docs/io-mongo.md b/site2/docs/io-mongo.md new file mode 100644 index 0000000..cc8ea98 --- /dev/null +++ b/site2/docs/io-mongo.md @@ -0,0 +1,20 @@ +--- +id: io-mongo +title: MongoDB Connector +sidebar_label: MongoDB Connector +--- + +## Sink + +The MongoDB Sink Connector is used to pull messages from Pulsar topics and persist the messages +to a collection. + +## Sink Configuration Options + +| Name | Default | Required | Description | +|------|---------|----------|-------------| +| `mongoUri` | `null` | `true` | The uri of mongodb that the connector connects to (see: https://docs.mongodb.com/manual/reference/connection-string/). | +| `database` | `null` | `true` | The name of the database to which the collection belongs to. | +| `collection` | `null` | `true` | The collection name that the connector writes messages to. | +| `batchSize` | `100` | `false` | The batch size of write to the collection. | +| `batchTimeMs` | `1000` | `false` | The batch operation interval in milliseconds. | \ No newline at end of file