beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Work logged] (BEAM-4824) Get BigQueryIO batch loads to return something actionable
Date Tue, 31 Jul 2018 17:57:00 GMT


ASF GitHub Bot logged work on BEAM-4824:

                Author: ASF GitHub Bot
            Created on: 31/Jul/18 17:56
            Start Date: 31/Jul/18 17:56
    Worklog Time Spent: 10m 
      Work Description: reuvenlax commented on issue #6055: [BEAM-4824] Batch BigQueryIO returns
job results
   Thanks! Sorry for the delay, I didn't see this review earlier.
   Some initial thoughts:
     1. Changing internal types of PCollections (e.g. PCollection<String> -> PCollection<BigQueryWriteResult>)
 in common transforms is something we try to avoid doing, as many users rely on being able
to in-place updates of their pipelines which is impossible when types change. Not a blocker,
we just might need to make the new behavior opt-in instead of the default.
    2. The set of load jobs generated is kind of an internal detail in BigQueryIO. It might
split an insert into multiple load jobs (and then run a copy job to merge them), or in the
case of streaming it might keep generating load jobs. In addition some of these load jobs
might simply be retry jobs for previously-failed load jobs. I'm not sure that outputting per
load-job information is going to give us what we want, when the logical model is per record.

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 129406)
    Time Spent: 20m  (was: 10m)

> Get BigQueryIO batch loads to return something actionable
> ---------------------------------------------------------
>                 Key: BEAM-4824
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-gcp
>            Reporter: Carlos Alonso
>            Assignee: Carlos Alonso
>            Priority: Minor
>          Time Spent: 20m
>  Remaining Estimate: 0h
> ATM BigQueryIO batchloads returns an empty collection that has no information related
to how the load job finished. It is even returned before the job finishes.
> Change it so that:
>  # The returning PCollection only appers when the job has actually finished
>  # The returning PCollection contains information about the job result

This message was sent by Atlassian JIRA

View raw message