From commits-return-114498-archive-asf-public=cust-asf.ponee.io@beam.apache.org Tue Jul 27 18:02:16 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 45B89180670 for ; Tue, 27 Jul 2021 20:02:16 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 4D03260264 for ; Tue, 27 Jul 2021 18:02:15 +0000 (UTC) Received: (qmail 80045 invoked by uid 500); 27 Jul 2021 18:02:14 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 80035 invoked by uid 99); 27 Jul 2021 18:02:14 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Jul 2021 18:02:14 +0000 Received: by gitbox.apache.org (ASF Mail Server at gitbox.apache.org, from userid 33) id 724E481F23; Tue, 27 Jul 2021 18:02:14 +0000 (UTC) Date: Tue, 27 Jul 2021 18:02:09 +0000 To: "commits@beam.apache.org" Subject: [beam] branch asf-site updated: Publishing website 2021/07/27 18:01:39 at commit 69822e4 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Message-ID: <162740892531.26326.18075822303445579045@gitbox.apache.org> From: git-site-role@apache.org X-Git-Host: gitbox.apache.org X-Git-Repo: beam X-Git-Refname: refs/heads/asf-site X-Git-Reftype: branch X-Git-Oldrev: 8207d470ae589df5c3b89d2d2c035cf18a949e19 X-Git-Newrev: 7fcfe6a62e375a661d76c0b1fe0159f0c1a15471 X-Git-Rev: 7fcfe6a62e375a661d76c0b1fe0159f0c1a15471 X-Git-NotificationType: ref_changed_plus_diff X-Git-Multimail-Version: 1.5.dev Auto-Submitted: auto-generated This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/asf-site by this push: new 7fcfe6a Publishing website 2021/07/27 18:01:39 at commit 69822e4 7fcfe6a is described below commit 7fcfe6a62e375a661d76c0b1fe0159f0c1a15471 Author: jenkins AuthorDate: Tue Jul 27 18:01:40 2021 +0000 Publishing website 2021/07/27 18:01:39 at commit 69822e4 --- website/generated-content/documentation/index.xml | 34 ++++++++++++++++++++++ .../documentation/programming-guide/index.html | 12 ++++---- website/generated-content/sitemap.xml | 2 +- 3 files changed, 42 insertions(+), 6 deletions(-) diff --git a/website/generated-content/documentation/index.xml b/website/generated-content/documentation/index.xml index 28f2ae0..8690ebb 100644 --- a/website/generated-content/documentation/index.xml +++ b/website/generated-content/documentation/index.xml @@ -6624,6 +6624,14 @@ transform at any point while constructing your pipeline to create a new <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">lines</span> <span class="o">=</span> <span class="n">pipeline</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromText</span><span class="p">(</span><span class="s1">&#39;gs://some/inputData.txt&#39;</spa [...] </div> </div> +<div class='language-go snippet'> +<div class="notebook-skip code-snippet"> +<a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> +<img src="/images/copy-icon.svg"/> +</a> +<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="nx">lines</span> <span class="o">:=</span> <span class="nx">textio</span><span class="p">.</span><span class="nf">Read</span><span class="p">(</span><span class="nx">scope</span><span class="p">,</span> <span class="err">&#39;</span><span class="nx">gs</span><span class="p">:</span><span class="o">//</span>&l [...] +</div> +</div> <h3 id="pipeline-io-writing-data">5.2. Writing output data</h3> <p>Write transforms write the data in a <code>PCollection</code> to an external data source. You will most often use write transforms at the end of your pipeline to output @@ -6645,6 +6653,14 @@ a <code>PCollection</code>'s data at any point in your pipeline.</p> <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">output</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">WriteToText</span><span class="p">(</span><span class="s1">&#39;gs://some/outputData&#39;</span><span class="p">)</span></code></pre></div> </div> </div> +<div class='language-go snippet'> +<div class="notebook-skip code-snippet"> +<a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> +<img src="/images/copy-icon.svg"/> +</a> +<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="nx">textio</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="nx">scope</span><span class="p">,</span> <span class="err">&#39;</span><span class="nx">gs</span><span class="p">:</span><span class="o">//</span><span class="nx">some</span><span class="o">/</span><s [...] +</div> +</div> <h3 id="file-based-data">5.3. File-based input and output data</h3> <h4 id="file-based-reading-multiple-locations">5.3.1. Reading from multiple locations</h4> <p>Many read transforms support reading from multiple input files matching a glob @@ -6670,6 +6686,14 @@ suffix &ldquo;.csv&rdquo; in the given location:</p> <span class="s1">&#39;path/to/input-*.csv&#39;</span><span class="p">)</span></code></pre></div> </div> </div> +<div class='language-go snippet'> +<div class="notebook-skip code-snippet"> +<a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> +<img src="/images/copy-icon.svg"/> +</a> +<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="nx">lines</span> <span class="o">:=</span> <span class="nx">textio</span><span class="p">.</span><span class="nf">Read</span><span class="p">(</span><span class="nx">scope</span><span class="p">,</span> <span class="s">&#34;path/to/input-*.csv&#34;</span><span class="p">)</span></code></pre></div> +</div> +</div> <p>To read data from disparate sources into a single <code>PCollection</code>, read each one independently and then use the <a href="#flatten">Flatten</a> transform to create a single <code>PCollection</code>.</p> @@ -6700,6 +6724,16 @@ location. Each file has the prefix &ldquo;numbers&rdquo;, a numeric tag, <span class="s1">&#39;/path/to/numbers&#39;</span><span class="p">,</span> <span class="n">file_name_suffix</span><span class="o">=</span><span class="s1">&#39;.csv&#39;</span><span class="p">)</span></code></pre></div> </div> </div> +<div class='language-go snippet'> +<div class="notebook-skip code-snippet"> +<a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> +<img src="/images/copy-icon.svg"/> +</a> +<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="c1">// The Go SDK textio doesn&#39;t support sharding on writes yet. +</span><span class="c1">// See https://issues.apache.org/jira/browse/BEAM-12664 for ways +</span><span class="c1"></span><span class="o">//</span> <span class="nx">to</span> <span class="nx">contribute</span> <span class="nx">a</span> <span class="nx">solution</span><span class="p">.</span></code></pre></div> +</div> +</div> <h3 id="provided-io-transforms">5.4. Beam-provided I/O transforms</h3> <p>See the <a href="/documentation/io/built-in/">Beam-provided I/O Transforms</a> page for a list of the currently available I/O transforms.</p> diff --git a/website/generated-content/documentation/programming-guide/index.html b/website/generated-content/documentation/programming-guide/index.html index 589910c..1e327ad 100644 --- a/website/generated-content/documentation/programming-guide/index.html +++ b/website/generated-content/documentation/programming-guide/index.html @@ -1844,16 +1844,16 @@ built-in transforms, you can i transforms.

5.1. Reading input data

Read transforms read data from an external source and return a PCollection representation of the data for use by your pipeline. You can use a read transform at any point while constructing your pipeline to create a new -PCollection, though it will be most common at the start of your pipeline.

PCollection<String> PCollection, though it will be most common at the start of your pipeline.

PCollection<String> PCollection's data at any point in your pipeline.

output.apply(TextIOPCollection's data at any point in your pipeline.

output.apply(TextIO*) to read all matching input files that have prefix “input-” and the
 suffix “.csv” in the given location:

p.apply("ReadFromText" [...]
     TextIO.read().from("protocol://my_bucket/path/to/input-*.csv"));
'path/to/input-*.csv')

To read data from disparate sources into a single PCollection, read each one + 'path/to/input-*.csv')

lines := textio.Flatten transform to create a single
 PCollection.

5.3.2. Writing to multiple output files

For file-based output data, write transforms write to multiple output files by default. When you pass an output file name to a write transform, the file name @@ -1863,7 +1863,9 @@ location. Each file has the prefix “numbers”, a numeric tag, and the “.csv”.

records.apply("WriteToText",
     TextIO.write().to("protocol://my_bucket/path/to/numbers")
                 .withSuffix(".csv"));
filtered_wor [...]
-    '/path/to/numbers', file_name_suffix='.csv')

5.4. Beam-provided I/O transforms

See the Beam-provided I/O Transforms + '/path/to/numbers', file_name_suffix='.csv')

// See https://issues.apache.org/jira/browse/BEAM-12664 for ways
+// to contribute a solution.

5.4. Beam-provided I/O transforms

See the Beam-provided I/O Transforms page for a list of the currently available I/O transforms.

6. Schemas

Often, the types of the records being processed have an obvious structure. Common Beam sources produce JSON, Avro, Protocol Buffer, or database row objects; all of these types have well defined structures, structures that can often be determined by examining the type. Even within a SDK pipeline, Simple Java POJOs @@ -3919,7 +3921,7 @@ kafka_records = ( ImplicitSchemaPayloadBuilder({'data': u'0'}), <Address of expansion service>)) assert_that(res, equal_to(['0a', '0b'])) -

  • After the job has been submitted to the Beam runner, shutdown the expansion service by terminating the expansion service process.

  • 13.3. Runner Support

    Currently, portable runners such as Flink, Spark, and the Direct runner can be used with multi-language pipelines.

    Google Cloud Dataflow supports multi-language pipelines through the Dataflow Runner v2 backend architecture.

  • After the job has been submitted to the Beam runner, shutdown the expansion service by terminating the expansion service process.

  • 13.3. Runner Support

    Currently, portable runners such as Flink, Spark, and the Direct runner can be used with multi-language pipelines.

    Google Cloud Dataflow supports multi-language pipelines through the Dataflow Runner v2 backend architecture.

    The Apache Software Foundation | Privacy Policy | RSS Feed

    Apache Beam, Apache, Beam, the Beam logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. All other products or name brands are trademarks of their respective holders, including The Apache Software Foundation.
    \ No newline at end of file diff --git a/website/generated-content/sitemap.xml b/website/generated-content/sitemap.xml index a52845a..f5d773e 100644 --- a/website/generated-content/sitemap.xml +++ b/website/generated-content/sitemap.xml @@ -1 +1 @@ -/blog/beam-2.31.0/2021-06-22T18:45:24-07:00/categories/blog/2021-07-01T15:48:01-07:00/blog/2021-07-01T15:48:01-07:00/categories/2021-07-01T15:48:01-07:00/blog/b [...] \ No newline at end of file +/blog/beam-2.31.0/2021-06-22T18:45:24-07:00/categories/blog/2021-07-01T15:48:01-07:00/blog/2021-07-01T15:48:01-07:00/categories/2021-07-01T15:48:01-07:00/blog/b [...] \ No newline at end of file