beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Yellin>
Subject I want to allow a user-specified QuerySplitter for DatastoreIO
Date Wed, 02 May 2018 19:50:30 GMT
Is it okay for me to expose Datastore in apache beam's DatastoreIO, and
thus indirectly expose
Is there a better solution?

As I explain in Beam 4186 <>,
I would like to be able to extend DatastoreV1.Read to have a
       withQuerySplitter(QuerrySplitter querySplitter)
method, which would use an alternative query splitter.   The standard one
shards by key and is very limited.

I have already written such a query splitter.  In fact, the query splitter
I've written goes further than specified in the beam, and reads the minimum
or maximum value of the field from the datastore if no minimum or maximum
is specified in the query, and uses that value for the sharding.   I can
       SELECT * FROM ledger where type = 'purchase'
and then ask it to shard on the eventTime, and it will shard nicely!  I am
working with the Datastore folks to separately add my new query splitter as
an option in DatastoreHelper.

I have already written the code to add withQuerySplitter.

However the problem is that I am increasing the "surface API" of Dataflow.
       QuerySplitter exposes Datastore  exposes DatastoreException  exposes
and is not (yet) part of the API surface.

As a solution, I've added package to the list of classes
exposed.  This package contains protobuf enums.  Is this okay?  Is there a
better solution?


View raw message