beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (BEAM-4652) PubsubIO: create subscription on different project than the topic
Date Thu, 28 Jun 2018 23:03:00 GMT

     [ https://issues.apache.org/jira/browse/BEAM-4652?focusedWorklogId=117110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-117110
]

ASF GitHub Bot logged work on BEAM-4652:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Jun/18 23:02
            Start Date: 28/Jun/18 23:02
    Worklog Time Spent: 10m 
      Work Description: kennknowles commented on a change in pull request #5788: [BEAM-4652]
Allow PubsubIO to read public data
URL: https://github.com/apache/beam/pull/5788#discussion_r199012618
 
 

 ##########
 File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java
 ##########
 @@ -693,9 +693,9 @@ public String toString() {
 
       @Nullable
       ValueProvider<ProjectPath> projectPath =
-          getTopicProvider() == null
+          getSubscriptionProvider() == null
 
 Review comment:
   Here is what I can tell:
   
    - `PubsubUnboundedSource` is made so that you can customize the project for the subscription.
    - But `PubsubIO` is _not_ made to customize the project.
   
   Before my change:
   
    - `PubsubIO` always uses the project off the topic if it is `read().fromTopic()` and leaves
it null if it is `read().fromSubscription()`
    - `PubsubUnboundedSource` always requires a `project` if it is given a `topic` because
it needs it to create the random subscription
   
   After my change:
   
    - `PubsubIO` always uses the project from the subscription and leaves it null for `read().fromTopic()`
    - `PubsubUnboundedSource` never requires a project, because it defaults to getting it
from PipelineOptions
   
   So I guess actually since `PubsubIO` never actually provides a useful project, it could
always be left null. Or probably better to refactor `PubsubUnboundedSource` to have two variants
with some shared internals so there are zero nullable fields.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 117110)
    Time Spent: 3h  (was: 2h 50m)

> PubsubIO: create subscription on different project than the topic
> -----------------------------------------------------------------
>
>                 Key: BEAM-4652
>                 URL: https://issues.apache.org/jira/browse/BEAM-4652
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-gcp
>            Reporter: Kenneth Knowles
>            Assignee: Kenneth Knowles
>            Priority: Critical
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> If you try to read a public pubsub topic in the DirectRunner, it will fail with 403 when
trying to create a subscription. This is because it tries to create a subscription on the
shared public data set.
> There is an example used in https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon
and the dataset is {{projects/pubsub-public-data/topics/taxirides-realtime}}. I discovered
that I could not read this in the DirectRunner even though the codelab works. But that 1.x
codelab also does not work in the InProcessPipelineRunner, so it has been broken all along.
> So you cannot read public data or any other read-only data using PubsubIO.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message