beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Work logged] (BEAM-3516) SpannerWriteGroupFn does not respect mutation limits
Date Wed, 14 Mar 2018 19:57:00 GMT


ASF GitHub Bot logged work on BEAM-3516:

                Author: ASF GitHub Bot
            Created on: 14/Mar/18 19:56
            Start Date: 14/Mar/18 19:56
    Worklog Time Spent: 10m 
      Work Description: NathanHowell commented on issue #4860: [BEAM-3516] Spanner BatchFn
does not respect mutation limits
   Hi @mairbek and @dhalperi, could you take a look at this change? It's a bit light on tests..

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 80508)
    Time Spent: 20m  (was: 10m)

> SpannerWriteGroupFn does not respect mutation limits
> ----------------------------------------------------
>                 Key: BEAM-3516
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.2.0
>            Reporter: Ryan Gordon
>            Assignee: Thomas Groh
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
> When using SpannerIO.write(), if it happens to be a large batch or a table with indexes
its very possible it can hit the Spanner Mutations Limitation and fail with the following
> {quote}Jan 02, 2018 2:42:59 PM org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler
> SEVERE: 2018-01-02T22:42:57.873Z: (3e7c871d215e890b):
INVALID_ARGUMENT: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: The transaction contains
too many mutations. Insert and update operations count with the multiplicity of the number
of columns they affect. For example, inserting values into one key column and four non-key
columns count as five mutations total for the insert. Delete and delete range operations count
as one mutation regardless of the number of columns affected. The total mutation count includes
any changes to indexes that the transaction generates. Please reduce the number of writes,
or use fewer indexes. (Maximum number: 20000)
> links {
>  description: "Cloud Spanner limits documentation."
>  url: ""
> }
> at
>  at
>  at
>  at
>  at
>  at$SessionImpl$
>  at$SessionImpl$
>  at
>  at$SessionImpl.writeAtLeastOnce(
>  at$PooledSession.writeAtLeastOnce(
>  at
>  at
>  at
> {quote}
> As a workaround we can override the "withBatchSizeBytes" to something much smaller:
> {quote}mutations.apply("Write", SpannerIO
>    .write()
>    // Artificially reduce the max batch size b/c the batcher currently doesn't
>    // take into account the 20000 mutation multiplicity limit
>    .withBatchSizeBytes(1024) // 1KB
>    .withProjectId("#PROJECTID#")
>    .withInstanceId("#INSTANCE#")
>    .withDatabaseId("#DATABASE#")
>  );
> {quote}
> While this is not as efficient, it at least allows it to work consistently

This message was sent by Atlassian JIRA

View raw message