beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2439) Datastore writer can fail to progress if Datastore is slow
Date Mon, 19 Jun 2017 10:43:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053774#comment-16053774
] 

ASF GitHub Bot commented on BEAM-2439:
--------------------------------------

GitHub user cph6 opened a pull request:

    https://github.com/apache/beam/pull/3390

    [BEAM-2439] Dynamic sizing of Datastore write RPCs

    This stops the Datastore connector from always sending 500 entities per RPC.
    Instead, it starts at a lower number which is more likely to complete within
    the deadline even in adverse conditions, and then increases or reduces the
    batch size in response to measured latency of past requests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cph6/beam datastore_batching

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3390.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3390
    
----
commit 1967f0cc3e52a3b2948d449db4bdc6e5ba0a6bdf
Author: Colin Phipps <fipsy@google.com>
Date:   2017-05-15T14:18:16Z

    Dynamic batching of entity writes.
    
    This stops the Datastore connector from always sending 500 entities per RPC.
    Instead, it starts at a lower number which is more likely to complete within
    the deadline even in adverse conditions, and then increases or reduces the
    batch size in response to measured latency of past requests.

----


> Datastore writer can fail to progress if Datastore is slow
> ----------------------------------------------------------
>
>                 Key: BEAM-2439
>                 URL: https://issues.apache.org/jira/browse/BEAM-2439
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>            Reporter: Colin Phipps
>            Assignee: Stephen Sisk
>            Priority: Minor
>              Labels: datastore
>
> When writing to Datastore, Beam groups writes into large batches (usually 500 entities
per write, the maximum permitted by the API). If these writes are slow to commit on the serving
side, the request may time out before all of the entities are written.
> When this happens, it loses any progress that has been made on those entities (the connector
uses non-transactional writes, so some entities might have been written, but partial results
are not returned to the connector so it has to assume that all entities need rewriting). It
will retry the write with the same set of entities, which may time out in the same way repeatedly.
This can be influenced by factors on the Datastore serving side, some of which are transient
(hotspots) but some of which are not.
> We (Datastore) are developing a fix for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message