beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner
Date Wed, 10 May 2017 05:38:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004106#comment-16004106
] 

ASF GitHub Bot commented on BEAM-1438:
--------------------------------------

GitHub user reuvenlax reopened a pull request:

    https://github.com/apache/beam/pull/1952

    BEAM-1438 Auto shard streaming sinks

    If a Write requests runner-determined sharding, per-bundle sharding is the default but
performs poorly in Dataflow's streaming runner. Instead, the runner statically picks a sharding
based on the number of workers.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/reuvenlax/incubator-beam streaming_auto_shard_write

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/1952.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1952
    
----
commit f4dfbb206382d3ea73881727aa8b0f74eaf98ef4
Author: Kenneth Knowles <klk@google.com>
Date:   2017-05-03T02:31:22Z

    Annotate internal methods of PCollection

commit c1b26a1b53c334ab171fad60501ba67593fde5d2
Author: Kenneth Knowles <klk@google.com>
Date:   2017-05-03T02:48:38Z

    Annotate internal pieces of sdks.transforms

commit 49cf433c5c08f3cc91512aa9544a36a5d3e84333
Author: Kenneth Knowles <klk@google.com>
Date:   2017-05-03T02:59:32Z

    Tighten access control and internal annotations for triggers

commit 9b8a4e5c4b876d4459c64a9bffee613aeae72fb2
Author: Kenneth Knowles <klk@google.com>
Date:   2017-05-03T03:05:34Z

    The transforms.reflect package is not for users

commit fe51cc0d1a8aa14adbee81b220f9ca8a442f26fe
Author: Kenneth Knowles <klk@google.com>
Date:   2017-05-03T03:05:45Z

    Annotate internal-only bits of Java sdk.runners

commit 58298d866fe9d1f4fcaf2ccda3078809f4d55b27
Author: Kenneth Knowles <klk@google.com>
Date:   2017-05-03T17:10:07Z

    Tighten access in sdk.options

commit 362d0be79222ad67f1639d54434c1505ef76752b
Author: Kenneth Knowles <klk@google.com>
Date:   2017-05-03T17:13:15Z

    Annotate internal methods on Pipeline

commit f43b61af4d5a3ee77a610d8b11ef80d421c34501
Author: Kenneth Knowles <klk@google.com>
Date:   2017-05-04T13:10:45Z

    This closes #2852: Tighten up access and use internal annotations a bit in the Java SDK
    
      Annotate internal methods on Pipeline
      Tighten access in sdk.options
      Annotate internal-only bits of Java sdk.runners
      The transforms.reflect package is not for users
      Tighten access control and internal annotations for triggers
      Annotate internal pieces of sdks.transforms
      Annotate internal methods of PCollection

commit 1f1c897264ea7ab050c8644344f6e2648af9ae4a
Author: Luke Cwik <lcwik@google.com>
Date:   2017-05-04T00:17:11Z

    [BEAM-2165] Update Apex to support serializing/deserializing custom user types configured
via Jackson modules

commit 02b72d6644c07b72a4c977a6cb16d59ec5a0ed8c
Author: Luke Cwik <lcwik@google.com>
Date:   2017-05-04T14:16:29Z

    [BEAM-2165] Update Apex to support serializing/deserializing custom user types configured
via Jackson modules
    
    This closes #2880

commit e5729b58330a05e7be510710d0027c004704946b
Author: Luke Cwik <lcwik@google.com>
Date:   2017-05-04T00:19:00Z

    [BEAM-2165] Update Dataflow to support serializing/deserializing custom user types configured
via Jackson modules
    
    This also updates the runner harness and existing tests to use a properly constructed
ObjectMapper for PipelineOptions.

commit 749b33f0b74a9bcd3daf03ea7f9b4579baec2651
Author: Luke Cwik <lcwik@google.com>
Date:   2017-05-04T14:27:17Z

    [BEAM-2165] Update Dataflow to support serializing/deserializing custom user types configured
via Jackson modules
    
    This closes #2881

commit f53e5d43d58c79ab9f3d04e112e6f05ad9dfe42f
Author: Luke Cwik <lcwik@google.com>
Date:   2017-05-04T00:12:20Z

    [BEAM-2165] Update Flink to support serializing/deserializing custom user types configured
via Jackson modules

commit 3c5891b31d8dbeafad0a6ffbea33afb92c01c374
Author: Luke Cwik <lcwik@google.com>
Date:   2017-05-04T14:29:28Z

    [BEAM-2165] Update Flink to support serializing/deserializing custom user types configured
via Jackson modules
    
    This closes #2879

commit cc654f02e8670ea789aee67508c569e7547ef11f
Author: Luke Cwik <lcwik@google.com>
Date:   2017-05-03T20:48:07Z

    [BEAM-1871] Migrate ReleaseInfo away from Google API client GenericJson

commit 98e92a0b8a4655a05fce4ae699f5bb93fe74f1de
Author: Luke Cwik <lcwik@google.com>
Date:   2017-05-04T14:41:15Z

    [BEAM-1871] Migrate ReleaseInfo away from Google API client GenericJson
    
    This closes #2868

commit 8a2dcdb6f9d4839c864a2c46c4b5254d0c7d4760
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-03T18:52:02Z

    DataflowRunner: integration test GCP-IO
    
    Triggered under `-DskipITs=false -Pdataflow-runner`

commit e1d4aa96338959a556c8b815ccb6b1aae118ad15
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-04T14:59:38Z

    This closes #2870

commit 1671708340fb9fc57cdc91c3bbacdff3ae6af4af
Author: yangping.wu <yangping.wu@qunar.com>
Date:   2017-05-04T06:04:08Z

    [BEAM-1491]Identify HADOOP_CONF_DIR(or YARN_CONF_DIR) environment variables

commit 588f57a1e6771883df84d06087a93fa4fc4baa54
Author: Luke Cwik <lcwik@google.com>
Date:   2017-05-04T15:48:23Z

    [BEAM-1491]Identify HADOOP_CONF_DIR(or YARN_CONF_DIR) environment variables
    
    This closes #2890

commit fba3d87ffec08f84c8be08ee16942b13364da2d9
Author: Robert Bradshaw <robertwb@google.com>
Date:   2017-05-03T21:56:37Z

    Split Coder's encode/decode methods into two methods depending on context.
    
    This allows the outer context to be marked deprecated.  A follow-up PR will
    remove the old method once all consumers have been updated.

commit d9293007d065c82111bf449502b5466042dc6335
Author: Luke Cwik <lcwik@google.com>
Date:   2017-05-04T15:59:05Z

    [BEAM-2166] Split Coder's encode/decode methods into two methods depending on context.
    
    This closes #2871

commit 690ec3b1f7b6ce9caaa7b9e401878e136f44bc50
Author: bchambers <bchambers@google.com>
Date:   2017-05-03T23:40:09Z

    [BEAM-2162] Add logging to long BigQuery jobs

commit ade5cbea605b99ebb6e566491ec64e12fc1a663d
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-04T16:00:36Z

    This closes #2882

commit 17ad1efe7355b238efb5e341487a8e22660b3b77
Author: Borisa Zivkovic <borisa.zivkovic@huawei.com>
Date:   2017-05-03T15:22:18Z

    Use BinaryCombineLongFn in GroupIntoBatches

commit d1afdd8e14b0a62368e0573ffbaffeac14997e2e
Author: Thomas Groh <tgroh@google.com>
Date:   2017-05-04T16:14:42Z

    This closes #2859

commit 70dad36f099ea0b454e2900302f7e7f866579f79
Author: Sourabh Bajaj <sourabhbajaj@google.com>
Date:   2017-05-03T20:50:46Z

    [BEAM-2152] Remove gcloud auth as application default credentials does it

commit 93020941a251bb62fc26f5e123a12df4f8e4ab1e
Author: Ahmet Altay <altay@google.com>
Date:   2017-05-04T16:27:43Z

    This closes #2869

commit c102d277e22cef8001c0f78d3a5ed00000e8d99d
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-04T00:50:20Z

    AvroIOTest: stop using IOChannelUtils, remove invalid test

commit e5a38ed2610b8ef72192e5a1b9a5630578300164
Author: Dan Halperin <dhalperi@google.com>
Date:   2017-05-04T00:55:32Z

    DataflowRunner: switch from IOChannels to FileSystems for creating files

----


> The default behavior for the Write transform doesn't work well with the Dataflow streaming
runner
> -------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-1438
>                 URL: https://issues.apache.org/jira/browse/BEAM-1438
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Reuven Lax
>            Assignee: Reuven Lax
>
> If a Write specifies 0 output shards, that implies the runner should pick an appropriate
sharding. The default behavior is to write one shard per input bundle. This works well with
the Dataflow batch runner, but not with the streaming runner which produces large numbers
of small bundles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message