From dev-return-9490-archive-asf-public=cust-asf.ponee.io@beam.apache.org Wed May 2 21:50:43 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3303618065D for ; Wed, 2 May 2018 21:50:42 +0200 (CEST) Received: (qmail 904 invoked by uid 500); 2 May 2018 19:50:41 -0000 Mailing-List: contact dev-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list dev@beam.apache.org Received: (qmail 890 invoked by uid 99); 2 May 2018 19:50:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 May 2018 19:50:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id EDD60180782 for ; Wed, 2 May 2018 19:50:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.042 X-Spam-Level: *** X-Spam-Status: No, score=3.042 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, HTML_OBFUSCATE_10_20=1.162, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=fyellin.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id y4p2AYQ2naFB for ; Wed, 2 May 2018 19:50:38 +0000 (UTC) Received: from mail-vk0-f46.google.com (mail-vk0-f46.google.com [209.85.213.46]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id BA5915F522 for ; Wed, 2 May 2018 19:50:37 +0000 (UTC) Received: by mail-vk0-f46.google.com with SMTP id g72-v6so5400624vke.2 for ; Wed, 02 May 2018 12:50:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fyellin.com; s=google; h=mime-version:from:date:message-id:subject:to; bh=74omtyhISp2f6vNg1HZnVWDOX2PfVhY+KzEfn/MqkDw=; b=WU3b76it9JviMxTVASjcW+m7Tf0uJtRlnM+mb8HV0k4YG8jUi6RNPuRL4NmuAOy5m6 j2wEmwYYm/zu0y4xNWQD011XgfzOoUF28QX2LC+BBRE81mquSihcOuquORvRrvXt1VEq IaK+ZHsR9I0sJUd1ZZsGwZoy3NC3RwUHtd+ig= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=74omtyhISp2f6vNg1HZnVWDOX2PfVhY+KzEfn/MqkDw=; b=oMdCFah5e6etZASqhOiXfaSdqYClZr1/g1PkY11W+xgtyfQ0ZnhFVBAChFcmCt3MZf u1wbL8NYzJCvb/SiDWCUjKxAXF3rLqCzoS22GUmAUkP8K+MyYGiYE/OUYK3gqD06gUkt oKw5vpXZy7hW5F7i26O9DFdHePIiBAXvL2u/lYu+PPb4nEDQqMCcAYVVNBzFmkMGbqbY IFOwcQ4HKXyHmcbPBw1xPobKpuGFJjJr8nZsTNiqmHjKwOSgP80KZyzZZmmZQDXI4H32 pEo8YU5Lt2GUpzpaw0HTL5y7VRMiH1xKUxe/EpHdCJJsp+4RYx3rhNMQdgBvb8gV93ji qc9Q== X-Gm-Message-State: ALQs6tDQ3aZwpVt1uOEabT2i9PvEFN3pwF1upuH7RfeczzOwUN9RTW7X /IdrMzm9xSF2tBc3lo/JwOkHURHtAZ58eeHnXAAoCTSwtJA= X-Google-Smtp-Source: AB8JxZqlnHtNVbV3QRL5l2svBfCPaONyBCeDQlBjS0wSaNTFMZoOagdY2e9/jjtbGu5R+RsamRnq2Gwnr/FTXky0Qk4= X-Received: by 2002:a1f:97d7:: with SMTP id z206-v6mr19238513vkd.183.1525290630694; Wed, 02 May 2018 12:50:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.23.90 with HTTP; Wed, 2 May 2018 12:50:30 -0700 (PDT) X-Originating-IP: [2620:0:100e:305:a8f0:268d:f5d2:b1e2] From: Frank Yellin Date: Wed, 2 May 2018 12:50:30 -0700 Message-ID: Subject: I want to allow a user-specified QuerySplitter for DatastoreIO To: dev@beam.apache.org Content-Type: multipart/alternative; boundary="00000000000038e9e0056b3e66c6" --00000000000038e9e0056b3e66c6 Content-Type: text/plain; charset="UTF-8" TLDR: Is it okay for me to expose Datastore in apache beam's DatastoreIO, and thus indirectly expose com.google.rpc.Code? Is there a better solution? As I explain in Beam 4186 , I would like to be able to extend DatastoreV1.Read to have a withQuerySplitter(QuerrySplitter querySplitter) method, which would use an alternative query splitter. The standard one shards by key and is very limited. I have already written such a query splitter. In fact, the query splitter I've written goes further than specified in the beam, and reads the minimum or maximum value of the field from the datastore if no minimum or maximum is specified in the query, and uses that value for the sharding. I can write: SELECT * FROM ledger where type = 'purchase' and then ask it to shard on the eventTime, and it will shard nicely! I am working with the Datastore folks to separately add my new query splitter as an option in DatastoreHelper. I have already written the code to add withQuerySplitter. https://github.com/apache/beam/pull/5246 However the problem is that I am increasing the "surface API" of Dataflow. QuerySplitter exposes Datastore exposes DatastoreException exposes com.google.rpc.Code and com.google.rpc.Code is not (yet) part of the API surface. As a solution, I've added package com.google.rpc to the list of classes exposed. This package contains protobuf enums. Is this okay? Is there a better solution? Thanks. --00000000000038e9e0056b3e66c6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
TLDR:=C2=A0
Is it okay for me to expose=C2=A0Datastore=C2=A0in apache be= am's=C2=A0DatastoreIO<= /font>, and thus indirectly expose=C2=A0com.google.rpc.Code?
Is there a better solution?<= /div>


As I explain in=C2=A0Beam 4186, I would like t= o be able to extend=C2=A0DatastoreV1.Read=C2=A0to have a
=C2=A0 =C2=A0 =C2=A0 =C2=A0withQuerySplitter(QuerryS= plitter querySplitter)
method, which would use an alternative q= uery splitter.=C2=A0 =C2=A0The standard one shards by key and is very limit= ed.

I have already written such a query splitter.=C2=A0 In f= act, the query splitter I've written goes further than specified in the= beam, and reads the minimum or maximum value of the field from the datasto= re if no minimum or maximum is specified in the query, and uses that value = for the sharding.=C2=A0 =C2=A0I can write:
=C2=A0 =C2=A0 =C2=A0 =C2=A0SELECT * FROM ledger where type =3D = 9;purchase'
and then ask it to shard on the=C2=A0eventTime, and it will shard = nicely!=C2=A0 I am working with the Datastore folks to separately add my ne= w query splitter as an option in=C2=A0DatastoreHelper.=C2=A0


I have alr= eady written the code to add=C2=A0withQuerySplitter.=C2=A0=C2=A0

=C2=A0 =C2= =A0 =C2=A0 =C2=A0https://github.com/apache/= beam/pull/5246

However the problem i= s that I am increasing the "surface API" of Dataflow.=C2=A0=C2=A0=
=C2=A0 =C2=A0 =C2=A0 =C2=A0Q= uerySplitter=C2=A0exposes=C2=A0Datastore=C2=A0 exposes=C2=A0DatastoreException=C2=A0 exposes=C2=A0com.google.rpc.Code
and=C2=A0<= font face=3D"monospace, monospace">com.google.rpc.Code=C2=A0is not (yet) part of the API surface.

As a so= lution, I've added package com.google.rpc to the list of classes expose= d.=C2=A0 This package contains protobuf enums.=C2=A0 Is this okay?=C2=A0 Is= there a better solution?

Thanks.

<= /div> --00000000000038e9e0056b3e66c6--