beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Sisk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2404) BigQueryIO reading stalls if no data is returned by query
Date Wed, 14 Jun 2017 22:40:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049768#comment-16049768
] 

Stephen Sisk commented on BEAM-2404:
------------------------------------

Hi Andre,

I investigated this bug, and I can't seem to repro it. I ran a test pipeline with two query/table
combos: 
1) reading using "select * from tableX" from a table that was empty
2) reading using "select * from tableY where afield=nonexistentvalue" from a table with rows
in it.

The code I used was equivalent to yours: 
    PCollection<TableRow> rows = p.apply(BigQueryIO.read().
        fromQuery("select * from sisk_bqio_empty.empty_table;")
        .withoutResultFlattening().usingStandardSql());
    PAssert.that(rows).empty();
    p.run();

I ran that in the TestDataflowRunner, and the job completed successfully.

Do you have any other information you'd like to share? Is this still an issue for you? 

> BigQueryIO reading stalls if no data is returned by query
> ---------------------------------------------------------
>
>                 Key: BEAM-2404
>                 URL: https://issues.apache.org/jira/browse/BEAM-2404
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>    Affects Versions: 2.0.0
>            Reporter: Andre
>            Assignee: Stephen Sisk
>
> When running a BigQueryIO query that doesn't return any rows (e.g. nothing has changed
in a delta job) the job seems to stall and nothing happens as no temp files are being written
which I think might be what it is waiting for. Just adding one row to the source table will
make the job run through successfully.
> Code:
> {code:java}
> PCollection <TableRow> rows = p.apply("ReadFromBQ",
>  BigQueryIO.read()
>  .fromQuery("SELECT * FROM `myproject.dataset.table`")
>  .withoutResultFlattening().usingStandardSql());
> {code}
> 			
> Log:
> {code:java}		
> Jun 02, 2017 9:00:36 AM org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl
startJob
> INFO: Started BigQuery job: {jobId=beam_job_batch-query, projectId=my-project}.
> bq show -j --format=prettyjson --project_id=my-project beam_job_batch-query
> Jun 02, 2017 9:03:11 AM org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase executeExtract
> INFO: Starting BigQuery extract job: beam_job_batch-extract
> Jun 02, 2017 9:03:12 AM org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl
startJob
> INFO: Started BigQuery job: {jobId=beam_job_batch-extract, projectId=my-project}.
> bq show -j --format=prettyjson --project_id=my-project beam_job_batch-extract
> Jun 02, 2017 9:04:06 AM org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase executeExtract
> INFO: BigQuery extract job completed: beam_job_batch-extract
> Jun 02, 2017 9:04:08 AM org.apache.beam.sdk.io.FileBasedSource expandFilePattern
> INFO: Matched 1 files for pattern gs://my-bucket/tmp/BigQueryExtractTemp/ff594d003c6440a1ad84b9e02858b5c6/000000000000.avro
> Jun 02, 2017 9:04:09 AM org.apache.beam.sdk.io.FileBasedSource getEstimatedSizeBytes
> INFO: Filepattern gs://my-bucket/tmp/BigQueryExtractTemp/ff594d003c6440a1ad84b9e02858b5c6/000000000000.avro
matched 1 files with total size 9750
> {code}	



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message