beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukasz Cwik <>
Subject Re: How to catch exceptions while using DatastoreV1 API
Date Mon, 16 Oct 2017 16:40:45 GMT
It depends on the runner but that exception that is thrown is per bundle
processed and it is up to the runner to choose what to do with bundles that

For a bounded (batch) pipeline, Dataflow will fail the pipeline after a
fixed number of retries of each bundle.
For an unbounded (streaming) pipeline, Dataflow will retry the bundle
forever until the pipeline is cancelled by the user.

On Sun, Oct 15, 2017 at 11:33 PM, Derek Hao Hu <>

> Hi,
> ​I'm using DatastoreV1 API to write data into Datastore. I've briefly gone
> through the implementation and it seems the Write transform will throw a
> DatastoreException (
> 1bd17d1b95a6b27331626fa9bdbaa723969b710d/sdks/java/io/
> google-cloud-platform/src/main/java/org/apache/beam/sdk/
> io/gcp/datastore/ when it fails to commit.​
> In a streaming pipeline, I think it might be possible that some commits
> might occasionally fail even after five retries, what is the expected
> behavior here? Is there a way to catch these failed mutations and then save
> them somewhere? I'm not sure what is the recommended approach since `Write`
> itself is already a transform which means there seems to be no easy way to
> catch which mutations / commits actually failed.
> Could someone help explain what the best approach here is? Right now I'm
> thinking of writing my own DoFn which just writes each entity to Datastore
> without even batching - by doing this it seems I'll be easy to catch or
> write the failed commits to logs. It doesn't seem to be the right approach
> though - considering there is already a significant amount of effort for
> Beam to provide it's own DatastoreIO.
> Thanks,
> --
> Derek Hao Hu
> Software Engineer | Snapchat
> Snap Inc.

View raw message