beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Yellin ...@fyellin.com>
Subject I want to allow a user-specified QuerySplitter for DatastoreIO
Date Wed, 02 May 2018 19:50:30 GMT
TLDR:
Is it okay for me to expose Datastore in apache beam's DatastoreIO, and
thus indirectly expose com.google.rpc.Code?
Is there a better solution?


As I explain in Beam 4186 <https://issues.apache.org/jira/browse/BEAM-4186>,
I would like to be able to extend DatastoreV1.Read to have a
       withQuerySplitter(QuerrySplitter querySplitter)
method, which would use an alternative query splitter.   The standard one
shards by key and is very limited.

I have already written such a query splitter.  In fact, the query splitter
I've written goes further than specified in the beam, and reads the minimum
or maximum value of the field from the datastore if no minimum or maximum
is specified in the query, and uses that value for the sharding.   I can
write:
       SELECT * FROM ledger where type = 'purchase'
and then ask it to shard on the eventTime, and it will shard nicely!  I am
working with the Datastore folks to separately add my new query splitter as
an option in DatastoreHelper.


I have already written the code to add withQuerySplitter.

       https://github.com/apache/beam/pull/5246

However the problem is that I am increasing the "surface API" of Dataflow.
       QuerySplitter exposes Datastore  exposes DatastoreException  exposes
com.google.rpc.Code
and com.google.rpc.Code is not (yet) part of the API surface.

As a solution, I've added package com.google.rpc to the list of classes
exposed.  This package contains protobuf enums.  Is this okay?  Is there a
better solution?

Thanks.

Mime
View raw message