beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (BEAM-5040) BigQueryIO retries infinitely in WriteTable and WriteRename
Date Mon, 30 Jul 2018 17:37:00 GMT

     [ https://issues.apache.org/jira/browse/BEAM-5040?focusedWorklogId=128818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-128818
]

ASF GitHub Bot logged work on BEAM-5040:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Jul/18 17:36
            Start Date: 30/Jul/18 17:36
    Worklog Time Spent: 10m 
      Work Description: chamikaramj commented on a change in pull request #6080: [BEAM-5040]
Fix retry bug for BigQuery jobs.
URL: https://github.com/apache/beam/pull/6080#discussion_r206256749
 
 

 ##########
 File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java
 ##########
 @@ -284,19 +305,25 @@ private void load(
           }
           return;
         case UNKNOWN:
-          LOG.info("Load job {} finished in unknown state: {}", jobRef, loadJob.getStatus());
-          throw new RuntimeException(
-              String.format(
-                  "UNKNOWN status of load job [%s]: %s.",
-                  jobId, BigQueryHelpers.jobToPrettyString(loadJob)));
-        case FAILED:
+          // This might happen if BigQuery's job listing is slow. Retry with the same
+          // job id.
           LOG.info(
-              "Load job {} failed, {}: {}",
+              "Load job {} finished in unknown state: {}: {}",
               jobRef,
-              (i < BatchLoads.MAX_RETRY_JOBS - 1) ? "will retry" : "will not retry",
-              loadJob.getStatus());
+              loadJob.getStatus(),
+              (i < maxRetryJobs - 1) ? "will retry" : "will not retry");
           lastFailedLoadJob = loadJob;
           continue;
+        case FAILED:
+          lastFailedLoadJob = loadJob;
+          jobId = BigQueryHelpers.getRetryJobId(jobId, projectId, bqLocation, jobService).jobId;
+          LOG.info(
 
 Review comment:
   Should we retry actually failed jobs ? This might have been due to a legitimate client
error, no ? Should we refer to error code/message before retrying ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 128818)
    Time Spent: 0.5h  (was: 20m)

> BigQueryIO retries infinitely in WriteTable and WriteRename
> -----------------------------------------------------------
>
>                 Key: BEAM-5040
>                 URL: https://issues.apache.org/jira/browse/BEAM-5040
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.5.0
>            Reporter: Reuven Lax
>            Assignee: Reuven Lax
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> BigQueryIO retries infinitely in WriteTable and WriteRename
> Several failure scenarios with the current code:
>  # It's possible for a load job to return failure even though it actually succeeded (e.g.
the reply might have timed out). In this case, BigQueryIO will retry the job which will fail
again (because the job id has already been used), leading to indefinite retries. Correct behavior
is to stop retrying as the load job has succeeded.
>  # It's possible for a load job to be accepted by BigQuery, but then to fail on the BigQuery
side. In this case a retry with the same job id will fail as that job id has already been
used. BigQueryIO will sometimes detect this, but if the worker has restarted it will instead
issue a load with the old job id and go into a retry loop. Correct behavior is to generate
a new deterministic job id and retry using that new job id.
>  # In many cases of worker restart, BigQueryIO ends up in infinite retry loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message